RE: [xml] Bug in parser (HTML)

Date view Thread view Subject view Author view

From: Marc Sanfacon (sanm@copernic.com)
Date: Mon Oct 30 2000 - 08:31:45 EST


Hi Daniel,
        I applied the patch this morning and using 'testHTML', I still have
the same problem. Here is the output:

SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElement(html)
SAX.startElement(body)
SAX.ignorableWhitespace(
, 2)
SAX.startElement(b)
SAX.characters(bbbbbbbbbb, 10)
SAX.endElement(b)
SAX.ignorableWhitespace( , 1)
SAX.startElement(b)
SAX.characters(ccccccccccccccc, 15)
SAX.endElement(b)
SAX.ignorableWhitespace(
, 2)
SAX.endElement(body)
SAX.endElement(html)
SAX.ignorableWhitespace(
, 2)
SAX.endDocument()

I will try to pinpoint the problem today.
Thank you,

Marc.

-----Original Message-----
From: xml-request@rufus.w3.org [mailto:xml-request@rufus.w3.org]On
Behalf Of Daniel Veillard
Sent: October 27, 2000 18:49 PM
To: xml@rpmfind.net
Subject: Re: [xml] Bug in parser (HTML)

On Fri, Oct 27, 2000 at 03:18:16PM -0700, Wayne Davison wrote:
> I don't see how that follows. Any whitespace inside a paragraph-like
> container is significant,

  That's not how it's done now :-)

> with the possible exception of leading and
> trailing whitespace (which occurs at paragraph boundaries). So,
> whitespace inside of <p>, <h1>, <td>, etc. is all significant, but
> whitespace directly inside something like <table> or <body> is not.

  Hum, currently there wasn't a distinction made between
    + mixed content
    + element child only
in the HTML parser. This could be added

> So, your "that ain't true for <b>" example confused me (but maybe I'm
> missing something). Are you saying that these are somehow different?
>
> <html><body><p>
> <b>bbbbbbbbbbbb</b>
> <b>cccccccccccc</b>
> </p></body></html>
>
> and
>
> <html><body><p>
> <b>bbbbbbbbbbbb</b> <b>cccccccccccc</b>
> </p></body></html>

  I think they are different. At a rendering level b is a text node
which just happen to possibly be rendered with a different font. That's
why I suggest handling spaces encountered after it as being significant.
  I may not be right, but the enclosed patch does this. If you prefer
something better see areBlanks(), modify it and send me the patch :-)

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | libxml Gnome XML toolkit
Tel : +33 476 615 257  | 655, avenue de l'Europe | http://xmlsoft.org/
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Rpmfind search site
 http://www.w3.org/People/all#veillard%40w3.org  | http://rpmfind.net/
----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Oct 30 2000 - 09:43:37 EST