[xml] UTF8ToHtml() changes

Date view Thread view Subject view Author view

From: Wayne Davison (wayned@blorf.net)
Date: Sun Aug 27 2000 - 17:18:35 EDT


I was trying to use UTF8ToHtml() to transform some internal-format
characters back into HTML, and it was failing to translate some curly
quotes and such. Come to find out that the array, html40EntitiesTable[],
is not sorted like UTF8ToHtml() expects it to be. Also, the function does
not escape actual ampersands in the input, which leads to ambiguities in
the output.

The attached patch does the following:

 + Sorts all the entries in html40EntitiesTable[] by unicode value.

 + Renamed htmlEntityLookup() to htmlEntityNameLookup() and then added
   htmlEntityValueLookup() (since I wanted to lookup entities by value
   in my own code). The value lookup code has a debug check that
   complains if it finds a value in the list that is out of order.

 + Modified UTF8ToHtml() to turn &, <, and > into entities. It also uses
   the new htmlEntityValueLookup() function (which uses a slightly more
   efficient linear scan -- it has a maximum of N+1 value comparisons
   rather than 2*N).

 + Fixed an off-by-one bug in UTF8ToHtml() when it was checking for enough
   room in the output buffer to fit an entity.

 + Tweaked the entity-copying code in UTF8ToHtml() a tad.

 + Removed a superfluous "i = 0" initialization that I happened to notice.

..wayne..


----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Aug 27 2000 - 14:43:12 EDT