Re: [xml] SAX, compression, etc

Date view Thread view Subject view Author view

From: mdf@angoss.com
Date: Fri May 26 2000 - 15:31:17 EDT


Daniel Veillard wrote:

> > 2) How do I do to parse special characters like '<' '>' in a string when
> > I read a content using characterSAXfunc?. I understand that they are
> > represented by expressions like 'lt' and 'gt' (just viewing gnumeric
> > files). Where can I read more to understand how to handle this?
>
> You don't need to handle them, the parser will, you will get < and >
> characters through the characters() SAX callback

... which seems to be true for the case of element content. If, however,
you have a document which has entities inside attributes, you will get
the 'escaped' version of the string unless you call a 'special' function.

Example:

        <a b="&amp;whee"/>

with the SAX parser, the attribute is returned

        "&amp;whee"

which is rather inconvenient. The fix for this problem is to
insert a call:

        xmlSubstituteEntitiesDefault(true);

just before the call to "xmlSAXUserParseFile" or similar. Then
you get what you really want:

        "&whee"

If I may also interject an editorial comment: I think a better fix for
this is to tweak libXML to always dereference the standard entity
references. Indeed, I was rather surprised to see this as an option: who
on Earth but an XML parser/generator would need (or want) to know about
the crazy entity reference schemes in XML?

----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:12 EDT