From: mdf@angoss.com
Date: Fri May 26 2000 - 15:31:17 EDT
Daniel Veillard wrote:
> > 2) How do I do to parse special characters like '<' '>' in a string when
> > I read a content using characterSAXfunc?. I understand that they are
> > represented by expressions like 'lt' and 'gt' (just viewing gnumeric
> > files). Where can I read more to understand how to handle this?
>
> You don't need to handle them, the parser will, you will get < and >
> characters through the characters() SAX callback
... which seems to be true for the case of element content. If, however,
you have a document which has entities inside attributes, you will get
the 'escaped' version of the string unless you call a 'special' function.
Example:
<a b="&whee"/>
with the SAX parser, the attribute is returned
"&whee"
which is rather inconvenient. The fix for this problem is to
insert a call:
xmlSubstituteEntitiesDefault(true);
just before the call to "xmlSAXUserParseFile" or similar. Then
you get what you really want:
"&whee"
If I may also interject an editorial comment: I think a better fix for
this is to tweak libXML to always dereference the standard entity
references. Indeed, I was rather surprised to see this as an option: who
on Earth but an XML parser/generator would need (or want) to know about
the crazy entity reference schemes in XML?
---- Message from the list xml@xmlsoft.org Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@xmlsoft.org
This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:12 EDT