Re: [xml] Loss of whitespace

Date view Thread view Subject view Author view

From: Paul DuBois (paul@snake.net)
Date: Fri Mar 03 2000 - 12:02:28 EST


>>On Thu, Mar 02, 2000 at 04:15:30AM +0100, Daniel Veillard wrote:
>>> This proves to be incredibly hard to fix !!!
>>[...]
>>> The only ways I can think about this is the following:
>>> 1/ provide a flag in the parser context to change the
>>> behaviour to pass all white spaces (if we are not validating)
>>> 2/ switch the parser to pass all white spaces to SAX,
>>> but in the DOM generation callback, remove all text
>>> nodes containing only empty spaces
>>
>> Ok, I have commited onto both CVS bases :
>> - a new heuristic with the same behavior as before except
>> it generates a text node when there is space and only spaces
>> between the opening tag and ending tag
>> - a extra flag in the parser context keepBlanks which if
>> set will diable any heuristic (though white space can
>> still be flagged as ignorable if there is a DTD and the
>> element is not mixed-content)
>>
>>I have generated a libxml-pre-1.8.7.tar.gz and put it at
>> ftp://rpmfind.net/pub/rpmfind/libxml-pre-1.8.7.tar.gz
>>
>>Gnumeric seems happy with it, please test if possible so that
>>I can trust it to release 1.8.7,
>>
>

I wrote:
>It does what *I* want (thanks!), but I'm the new guy here. Dunno what
>others think.

I may have spoken too soon. Or else I don't understand how the parser
works with respect to whitespace. It looks to me like some whitespace-only
text segments still get lost. Here's a short document to illustrate this:

<?xml version="1.0"?>
<root>
<y>
</y>
<y>
<x> </x>
</y>
</root>

The output of tester --debug is as follows (I've modified this to
allow whitespace to be seen more easily):

DOCUMENT
version=1.0
standalone=true
   ELEMENT root
     ELEMENT y
       TEXT
       content=(\n)
     ELEMENT y
       ELEMENT x
         TEXT
         content=( )

There are no TEXT nodes for:

- newline between <root> and <y>
- newline between </y> and <x>
- newline between </x> and </y>
- newline betwen </y> and </root>

-- 
Paul DuBois, paul@snake.net
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:07 EDT