[xml] RE: Question about libxml...

Date view Thread view Subject view Author view

From: Marc Sanfacon (sanm@copernic.com)
Date: Wed Nov 15 2000 - 16:45:10 EST


I am not sure this is the fix we should add, but it solved one of the
problem:

I added the following code in htmlParseStartTag in HTMLParser.c

    if (xmlStrEqual(name, BAD_CAST"meta"))
        meta = 1;

    /*
     * Check to see if we already have the html tag
     */
    if (ctxt->nameNr > 0 && xmlStrEqual(name, BAD_CAST"html")) {
        xmlFree(name);
        return;
    }

    /*
     * Check for auto-closure of HTML elements.
     */
    htmlAutoClose(ctxt, name);

    /*

So don't add an 'html' tag if the nameNr is greater than 0, so we already
have a html tag. No the result is better, but I still have 2 head tag.

I tough of another way to solve this. Add a table of tag that should not be
added if they are already there, and ensure the are not already in the
stack.

What do you think ?

Marc.

> -----Original Message-----
> From: xml-request@rufus.w3.org [mailto:xml-request@rufus.w3.org]
> On Behalf Of Marc Sanfacon
> Sent: November 15, 2000 14:51 PM
> To: 'xml@rpmfind.net'
> Subject: Question about libxml...
>
> Hi there,
> we have found a problem in the HTML parser. Here is my HTML code:
>
> <SCRIPT LANGUAGE="JavaScript">
> <!--
> var cobrand_directory = "";
> //-->
> </SCRIPT>
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML>
> <HEAD>
> <TITLE>Title</TITLE>
> </HEAD>
>
> <BODY>
> This is a test
> </BODY>
> </HTML>
>
>
> libxml (2.2.7) outputs the following:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <html><head>
> <script language="JavaScript">
> <!--
> var cobrand_directory = "";
> //-->
> </script>
> <html>
> <head><title>Title</title></head>
> <body><p>
> This is a test
> </p></body>
> </html>
> </html>
>
> As you can see, the results contain 2 html tags, 2 head tags, 2 ending
> html and only 1 head tag.
> I have pinpointed where this comes from (htmlcheckImplied), but haven't
> found where to fix it yet.
>
> I think there should be only 1 html and 1 head tag with the proper ending
> tag.
>
> I posted this, just in case Daniel, or somebody else, can fix the problem
> or can help me fix it.
>
> Regards,
> Marc
>
> ---------------------------------------------------------------------
> "Better the pride that resides, in a citizen of the world.
> Than the pride that divides, when a colorful rag is
> unfurled." Neil Peart
> ---------------------------------------------------------------------
> Marc Sanfacon, Software developer Copernic.com
> e-mail: msanfacon@copernic.com R&D Group
> Tel : (418) 527-0528 ext 1212
>
>


----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Nov 15 2000 - 17:43:37 EST