From: Wayne Davison (wayned@blorf.net)
Date: Sun Sep 17 2000 - 16:37:27 EDT
On Sun, 17 Sep 2000, Daniel Veillard wrote:
> If there is fixes which seems to be missing, send them again
I'm been expecting you to comment on (or perhaps accept) my recent patch
to fix the parsing of UTF8 characters in HTML tag-attribute values.
Here's the patch:
Index: HTMLparser.c
@@ -1970,7 +1970,7 @@
}
} else {
unsigned int c;
- int bits;
+ int bits, l;
if (out - buffer > buffer_size - 100) {
int index = out - buffer;
@@ -1978,7 +1978,7 @@
growBuffer(buffer);
out = &buffer[index];
}
- c = CUR;
+ c = CUR_CHAR(l);
if (c < 0x80)
{ *out++ = c; bits= -6; }
else if (c < 0x800)
Attached is a test file that demonstrates the problem when it is run like
this:
./testHTML -sax test.html
Both HREF tags have the same high-bit character in them (‘), but the
second instance outputs an "Â" instead.
..wayne..
---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net
This archive was generated by hypermail 2b29 : Sun Sep 17 2000 - 16:43:12 EDT