Re: [xml] SAX, compression, etc

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Sat May 27 2000 - 07:50:49 EDT


On Fri, May 26, 2000 at 07:31:17PM +0000, mdf@angoss.com wrote:
>
> Daniel Veillard wrote:
>
> > > 2) How do I do to parse special characters like '<' '>' in a string when
> > > I read a content using characterSAXfunc?. I understand that they are
> > > represented by expressions like 'lt' and 'gt' (just viewing gnumeric
> > > files). Where can I read more to understand how to handle this?
> >
> > You don't need to handle them, the parser will, you will get < and >
> > characters through the characters() SAX callback
>
> ... which seems to be true for the case of element content. If, however,
> you have a document which has entities inside attributes, you will get
> the 'escaped' version of the string unless you call a 'special' function.

  yes this is a limitation of the SAX interface. If you want to be able to
build a tree from the SAX output, and save back entities references in
attribute you don't have any choice left. SAX pass the attributes values
as strings in one block. There is no possibility of getting an entity
reference call since SAx consider the parsing of a full opening tag to
be atomic.

> Example:
>
> <a b="&amp;whee"/>
>
> with the SAX parser, the attribute is returned
>
> "&amp;whee"
>
> which is rather inconvenient. The fix for this problem is to
> insert a call:
>
> xmlSubstituteEntitiesDefault(true);
>
> just before the call to "xmlSAXUserParseFile" or similar. Then
> you get what you really want:
>
> "&whee"

  yes, but xmlSubstituteEntitiesDefault(true); means taht you cannot save
back entities references if you build a tree. The "macro" substitution
is lost !

> If I may also interject an editorial comment: I think a better fix for
> this is to tweak libXML to always dereference the standard entity
> references. Indeed, I was rather surprised to see this as an option: who
> on Earth but an XML parser/generator would need (or want) to know about
> the crazy entity reference schemes in XML?

  To be able to save back of course !!!

Typical example is to use an entity for example for the status of the
document or authors names ... If libxml was not able to save them
95% of document authors would consider the tool irremediably broken !
Remember, being able to round-trip from the serialization to the tree and
back to the serialization is one of the key principle design of libxml.
People using only SAX may not see the point but this is a very important
one.

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:13 EDT