From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Fri Oct 27 2000 - 14:45:10 EDT
On Fri, Oct 27, 2000 at 01:39:27PM -0400, Marc Sanfacon wrote:
> Hi there the following document causes a bug in the resulting parsing:
>
> <html><body>
> <b>bbbbbbbbbb</b> <b>ccccccccccccccc</b>
> </body></html>
>
> The parsing looses the ' ' (space) between bbbbbbbb & cccccccc. Is it the
> normal behavior of libxml. One of our developer found this bug and I
> haven't looked at it yet. So if you tell me this is normal, I won't look.
Well we would' like to lose the space between b's and c's in the
following:
bbbbbbbbbb ccccccccccccccc
Well, it's kinda tricky, here is what's happening:
Start of element body: pushed body
SAX.startElement(body)
Start of element body, was html
SAX.ignorableWhitespace(
, 1)
Start of element b: pushed b
SAX.startElement(b)
Start of element b, was body
SAX.characters(bbbbbbbbbb, 10)
Close of b stack: 3 elements
0 : html
1 : body
2 : b
SAX.endElement(b)
End of tag b: popping out b
SAX.ignorableWhitespace( , 1)
the heuristic concludes it's an ignorable white space.
It should not really. The CR after the opening body should be
considered as such, as well as the one between the 2 p elements
in the following.
<p>bla</p>
<p>bla</p>
but b really indicates text plus style, but it's text not structure
while p is just structure. We should not consider ignorable white spaces
those occuring between elements representing stylistic info. <em> and
<bold> are two other examples coming to mind.
We should add detection of those and avoid considering ignorable spaces
those in those context .... I will look at it,
Daniel
-- Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes | libxml Gnome XML toolkit Tel : +33 476 615 257 | 655, avenue de l'Europe | http://xmlsoft.org/ Fax : +33 476 615 207 | 38330 Montbonnot FRANCE | Rpmfind search site http://www.w3.org/People/all#veillard%40w3.org | http://rpmfind.net/ ---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net
This archive was generated by hypermail 2b29 : Fri Oct 27 2000 - 15:43:36 EDT