Re: [xml] Another buglet in HTML parser

Date view Thread view Subject view Author view

From: Kristian Hogsberg Kristensen (hogsberg@daimi.au.dk)
Date: Mon Oct 11 1999 - 09:00:38 EDT


Daniel Veillard <Daniel.Veillard@w3.org> writes:

> Hi Kristian,
>
> > Here's some more feedback on the HTML parser: when parsing the
> > following
> >
> > <html>
> >
> > <body>
> > <ol>
> > <li> hello
> > </ol></body>
> >
> > </html>
>
> Hum, strange, I'm unable to reproduce this problem.

Did you try it on *exactly* the above example? In particular the error
only surfaces, when there's no whitespace between </ol> and </body>.
I just checked out the W3C cvs version and it still has the bug.

[...]

> IMHO, the current code is right, the test for the end of this
> element parsing is (rightly) done after the element content has
> been parsed when one reenter the loop. This should not change anything.
>
> > In the example above, parsing the li element autocloses the ol
>
> Huh ? It should not ... Did you change something which makes LI
> autoclose OL, if yes that's an error, and this explains why the parser
> later on complains about an OL not being open ...

Okay, I probably wasn't very clear there. What I meant was that, when
seeing the "<" of <li> in htmlParseContent, htmlParseElement is
called. This reads data up to and including the ">" of </ol> and when
returning from the call the li and ol elements have been closed, and
htmlParseContent should just return then. However, the test for
wether current element has been autoclosed is never reached, since the
parser is now looking at "</" from </body>, which terminates the loop.
When the loop terminates htmlParseEndTag is called, which tries to
parse </body> as the closing tag for <ol>

regards,
Kristian

----
Message from the list xml@rufus.w3.org
Archived at : http://rufus.w3.org/veillard/XML/messages
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rufus.w3.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:29:49 EDT