[xml] Strange stuff in htmlParseElement

Date view	Thread view	Subject view	Author view

From: Kristian Hogsberg Kristensen (hogsberg@daimi.au.dk)
Date: Tue Oct 26 1999 - 18:53:33 EDT

Next message: Joerg Wittenberger: "[xml] libxml looses namespace info"
Previous message: Gilles FILIPPINI: "[xml] Réf. : Re: libxml on Solaris 2.6"

Hi,

When parsing this document:

The HTML parser makes "foo" a child of <HR>... I tracked the problem
to this piece of code in htmlParseElement (line 2310-2317):

    if (((depth == ctxt->nameNr) && (oldname == ctxt->name)) ||
        (name == NULL)) {
        if (CUR == '>')
            NEXT;
        return;
    }

which look a bit weird to me... I dont see what it's supposed to do.
What happens is that <HR> autocloses <P>, and now oldname points to
freed memory. Accidently, this memory is used for the name of the new
name, so oldname == ctxt->name and thus htmlParseElement returns
prematurely (it doesn't reach the test for info->empty).

I see you've made <DD> autoclose <DT> and <DT> autoclose <DD>, but
what about also making <DD> autoclose <DD> and likewise for <DT>?
This would make the parser a bit more robust; suppose someone were to
do something like:

it would get parsed as

which I believe is a bit more useful.

regards,
Kristian

----
Message from the list xml@rufus.w3.org
Archived at : http://rufus.w3.org/veillard/XML/messages
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rufus.w3.org

Next message: Joerg Wittenberger: "[xml] libxml looses namespace info"
Previous message: Gilles FILIPPINI: "[xml] Réf. : Re: libxml on Solaris 2.6"

Date view	Thread view	Subject view	Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:29:50 EDT