From: Wayne Davison (wayned@blorf.net)
Date: Sun Aug 27 2000 - 20:41:47 EDT
On Mon, 28 Aug 2000, Daniel Veillard wrote:
> And for a good reason, the escaping is done at another level
Oh, of course. Thanks, I missed that. It totally makes sense, now that
you mention it, though.
> I prefer having one extra byte left in the case the user need to
> add a zero, in that case there is no loss in output, and the fact that
> UTF8ToHtml() returns -2 is not handled as an error condition, it's the
> normal way to use the conversion filters. That one byte is for safety at
> no cost in reality.
Then the other check is wrong -- the one that appends single characters.
It should be:
if (out + 1 >= outend)
Also, you'll want to change the entity version back to use >= again
(you've got my tweaked version in cvs at the moment).
> This may be worth a separate function if you just want to output
> a string extrated from the internal representation. I would accept
> it without problem.
Yes, I was looking for something to use with the sax handler. I wasn't
quite sure what to name it, but I chose htmlEncodeEntities() and put it in
the HTMLparser.c file. My function doesn't return an error for entities
not in the table (it returns a numeric entity). It just returns -2 for
encoding errors. See if you like it.
I also changed testHTML.c to use it to output entities rather than raw
UTF-8 (which you might want to make into an option).
Patch attached (only for the new function, not for the off-by-one
changes you may wish to make).
..wayne..
---- Message from the list xml@xmlsoft.org Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@xmlsoft.org
This archive was generated by hypermail 2b29 : Sun Aug 27 2000 - 18:43:23 EDT