Re: [xml] Adjacent TEXT nodes aren't merged

Date view	Thread view	Subject view	Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Sat Oct 21 2000 - 05:22:21 EDT

Next message: Daniel Veillard: "Re: [xml] libxml2-2.2.5"
Previous message: Timothee Besset: "[xml] libxml2-2.2.5"
In reply to: TOM: "[xml] Adjacent TEXT nodes aren't merged"

On Fri, Oct 20, 2000 at 09:25:01PM +0200, TOM wrote:
> Hi,
> I used 'xmllint --debug --noent test/ent1' (from libxml2-2.2.5) and
> contrary to what was expected I had 3 TEXT nodes.

Hi TOM,

> The attached patch should fix this.

This might be slightly more complex than the proposed solution, but it's
nearly complete. The problem you faced is:
  - there is a reference to a parsed entity
  - libxml do build the DOM equivalent of the entity content
    the first time the entity is referenced:
    + this is needed to make sure that teh entity content is well
      balanced (checking needed only if referenced)
    + when not substituting entities libxml currently keep a
      single copy of the DOM subtree attached to an entity reference
      (which raises some problem w.r.t. namespace support but reduces
       the memory usage if frequently referenced and allow a change
       made to the entity content to be simply reported by all
       references)
  - when substituting entities, the xmlParseReference() function does
    a xmlCopyNodeList(ent->children); and then an xmlAddChildList()
    at the current node level.
That explains why you got 3 node one of the text before, one for
the content of the entity reference and one for the text after.

So your patch cover those well, but doesn't free the result of
xmlCopyNodeList() resulting in a memory leak in this case.

Since I far prefer a segfault on the developper workbench than
silent memory leaks which are the poison of the current software
industry, I preferred to modify the semantic of the merge operation
and to possibly free up the merged nodes. I hope this won't led to
too many troubles. This is a somewhat serious change of semantic but
it seems to not break anything in my testsuite, and completely avoid
memory leaks.
Programmers can be aware that the node has been freed if the insertion
function don't return the new node ...

Patch enclosed to the 2.2.5 version.

> I went on this after a thread on comp.text.xml about Xerces, which
> doesn't merge adjacent TEXT nodes or substitute entities. A little bit
> annoying when processing XSLT !
> For curious, Message-Id: <G2Hs12.G8I@world.std.com>

Yeah, in general substituting entities makes a serious change of
the content model. We start hitting some of the divergences between
the XPath (on which XSLT/XPointer is built and the XML Infoset), I hope
this will be cleaned up in the future.
Another think which should be changed too in libxml is that I don't
generate the blanks node used for formatting when validating and that
libxml knows that it's not a mixed content model. They should be put
back since most implementation seems to do the same and to be sure that
the DOM tree obtained is the same whether validation is done at parsing
time or later on (if done at all).

Daniel

text/plain attachment: tree.diff

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net

Next message: Daniel Veillard: "Re: [xml] libxml2-2.2.5"
Previous message: Timothee Besset: "[xml] libxml2-2.2.5"
In reply to: TOM: "[xml] Adjacent TEXT nodes aren't merged"

Date view	Thread view	Subject view	Author view

This archive was generated by hypermail 2b29 : Sat Oct 21 2000 - 05:43:35 EDT