From: Kristian Hogsberg Kristensen (hogsberg@daimi.au.dk)
Date: Tue Oct 26 1999 - 18:53:33 EDT
Hi,
When parsing this document:
<HTML>
<BODY>
<P>
<HR>
foo
</BODY>
</HTML>
The HTML parser makes "foo" a child of <HR>... I tracked the problem
to this piece of code in htmlParseElement (line 2310-2317):
if (((depth == ctxt->nameNr) && (oldname == ctxt->name)) ||
(name == NULL)) {
if (CUR == '>')
NEXT;
return;
}
which look a bit weird to me... I dont see what it's supposed to do.
What happens is that <HR> autocloses <P>, and now oldname points to
freed memory. Accidently, this memory is used for the name of the new
name, so oldname == ctxt->name and thus htmlParseElement returns
prematurely (it doesn't reach the test for info->empty).
I see you've made <DD> autoclose <DT> and <DT> autoclose <DD>, but
what about also making <DD> autoclose <DD> and likewise for <DT>?
This would make the parser a bit more robust; suppose someone were to
do something like:
<DL>
<DD>foo
<DD>bar
</DL>
it would get parsed as
<DL>
<DD>foo</DD>
<DD>bar</DD>
</DL>
which I believe is a bit more useful.
regards,
Kristian
---- Message from the list xml@rufus.w3.org Archived at : http://rufus.w3.org/veillard/XML/messages to unsubscribe: echo "unsubscribe xml" | mail majordomo@rufus.w3.org
This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:29:50 EDT