[xml] Strange stuff in htmlParseElement

Date view Thread view Subject view Author view

From: Kristian Hogsberg Kristensen (hogsberg@daimi.au.dk)
Date: Tue Oct 26 1999 - 18:53:33 EDT


Hi,

When parsing this document:

    <HTML>
      <BODY>
    
      <P>
      <HR>
      foo
    
      </BODY>
    </HTML>

The HTML parser makes "foo" a child of <HR>... I tracked the problem
to this piece of code in htmlParseElement (line 2310-2317):

    if (((depth == ctxt->nameNr) && (oldname == ctxt->name)) ||
        (name == NULL)) {
        if (CUR == '>')
            NEXT;
        return;
    }

which look a bit weird to me... I dont see what it's supposed to do.
What happens is that <HR> autocloses <P>, and now oldname points to
freed memory. Accidently, this memory is used for the name of the new
name, so oldname == ctxt->name and thus htmlParseElement returns
prematurely (it doesn't reach the test for info->empty).

I see you've made <DD> autoclose <DT> and <DT> autoclose <DD>, but
what about also making <DD> autoclose <DD> and likewise for <DT>?
This would make the parser a bit more robust; suppose someone were to
do something like:

    <DL>
      <DD>foo
      <DD>bar
    </DL>

it would get parsed as

    <DL>
      <DD>foo</DD>
      <DD>bar</DD>
    </DL>

which I believe is a bit more useful.

regards,
Kristian

----
Message from the list xml@rufus.w3.org
Archived at : http://rufus.w3.org/veillard/XML/messages
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rufus.w3.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:29:50 EDT