Re: [xml] UTF8ToHtml() changes

Date view Thread view Subject view Author view

From: Wayne Davison (wayned@blorf.net)
Date: Sun Aug 27 2000 - 20:41:47 EDT


On Mon, 28 Aug 2000, Daniel Veillard wrote:
> And for a good reason, the escaping is done at another level

Oh, of course. Thanks, I missed that. It totally makes sense, now that
you mention it, though.

> I prefer having one extra byte left in the case the user need to
> add a zero, in that case there is no loss in output, and the fact that
> UTF8ToHtml() returns -2 is not handled as an error condition, it's the
> normal way to use the conversion filters. That one byte is for safety at
> no cost in reality.

Then the other check is wrong -- the one that appends single characters.
It should be:

        if (out + 1 >= outend)

Also, you'll want to change the entity version back to use >= again
(you've got my tweaked version in cvs at the moment).

> This may be worth a separate function if you just want to output
> a string extrated from the internal representation. I would accept
> it without problem.

Yes, I was looking for something to use with the sax handler. I wasn't
quite sure what to name it, but I chose htmlEncodeEntities() and put it in
the HTMLparser.c file. My function doesn't return an error for entities
not in the table (it returns a numeric entity). It just returns -2 for
encoding errors. See if you like it.

I also changed testHTML.c to use it to output entities rather than raw
UTF-8 (which you might want to make into an option).

Patch attached (only for the new function, not for the off-by-one
changes you may wish to make).

..wayne..


----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Aug 27 2000 - 18:43:23 EDT