RE: [xml] Memory parser bug

Date view Thread view Subject view Author view

From: Hugh Greene (hughg@owl.co.uk)
Date: Tue Apr 18 2000 - 13:12:32 EDT


Well spotted :-) This doesn't seem quite right, though (I haven't looked at
this code much, however, so I could be completely wrong!). Not that the
things I see would make any functional difference, but ...

> -----Original Message-----
> From: xml-request@rufus.w3.org [mailto:xml-request@rufus.w3.org]On
> Behalf Of kavoos
> Sent: 17 April 2000 15:33
> ...
> I have found a little bug when using xmlParseMemory() whith a
> UTF-16 document, this patch correct the probleme ...
>
> --- ../old/libxml2-2.0.0/parser.c Wed Apr 12 15:50:22 2000
> +++ parser.c Mon Apr 17 16:17:25 2000
> @@ -2317,7 +2317,7 @@
> if ((enc == XML_CHAR_ENCODING_UTF16BE) &&
> (ctxt->input->cur[0] == 0xFE) &&
> (ctxt->input->cur[1] == 0xFF)) {
> - SKIP(2);
> + SKIP(2);
> }
>
> /*

This looks like you've accidentally inserted an extra space -- I can't see
it making any difference!

> @@ -2400,8 +2400,13 @@
> "xmlSwitchEncoding : out of memory\n");
> return;
> }
> + if ( (ctxt->input->cur[0]==0xFE &&
> ctxt->input->cur[1]==0xFF) || (ctxt->input->cur[0]==0xFF &&
> ctxt->input->cur[1]==0xFE))
> + res = handler->input(buf, ctxt->input->length * 4,
> + &ctxt->input->cur[2], &len);
> + else
> res = handler->input(buf, ctxt->input->length * 4,
> ctxt->input->cur, &len);
> +
> if ((res < 0) ||
> (len != ctxt->input->length - processed)) {
> if ((ctxt->sax != NULL) && (ctxt->sax->error !=
> NULL))

I think, in order to be consistent with the first branch of the "if
(ctxt->input->buf != NULL)", the test-and-fix for the endian-mark should be
done the same way as there and should happen before the assignment to
"processed", so that it's consistent with ctxt->input->cur. (Come to think
of it, why don't both cases of the test use the "NXT" macro?)

I get the feeling that it should be possible to "lift" sizeable chunks of
this code out of the "if (ctxt->input->buf != NULL)" (e.g., the assignments
to processed and len, the endian test-and-fix, the buffer alloc-with-retry)
but I haven't worked out the details yet :-)

> @@ -2416,8 +2421,10 @@
> if ((ctxt->input->free != NULL) &&
> (ctxt->input->base != NULL))
> ctxt->input->free((xmlChar *)
> ctxt->input->base);
> - ctxt->input->base = ctxt->input->cur = buf;
> - ctxt->input->length = res;
> +
> + ctxt->input->base = ctxt->input->cur = buf;
> + ctxt->input->length = res;
> +
> }
> }
> } else {

Again, it looks like you've just changed the indentation here -- and this
time it makes a (bad) difference, in that it incorrectly suggests two
assignments are part of the "if" when they're not.

> @@ -2442,17 +2449,9 @@
> /* default encoding, no conversion should be needed */
> return;
> case XML_CHAR_ENCODING_UTF16LE:
> - ctxt->errNo = XML_ERR_UNSUPPORTED_ENCODING;
> - if ((ctxt->sax != NULL) && (ctxt->sax->error != NULL))
> - ctxt->sax->error(ctxt->userData,
> - "char encoding UTF16 little endian not supported\n");
> - break;
> + return;
> case XML_CHAR_ENCODING_UTF16BE:
> - ctxt->errNo = XML_ERR_UNSUPPORTED_ENCODING;
> - if ((ctxt->sax != NULL) && (ctxt->sax->error != NULL))
> - ctxt->sax->error(ctxt->userData,
> - "char encoding UTF16 big endian not supported\n");
> - break;
> + return;
> case XML_CHAR_ENCODING_UCS4LE:
> ctxt->errNo = XML_ERR_UNSUPPORTED_ENCODING;
> if ((ctxt->sax != NULL) && (ctxt->sax->error != NULL))

These changes look pointless: the "return" statements will never be reached
and in any case, nothing happens after the switch except exiting the
function. Switch statements are confusing enough without "return"s in the
middle ;-)

> @@ -10406,7 +10405,7 @@
> input->base = BAD_CAST buffer;
> input->cur = BAD_CAST buffer;
> input->free = NULL;
> -
> + input->length=size;
> inputPush(ctxt, input);
> return(ctxt);
> }
> ----

This looks reasonable, though I don't know if it's right or not.

Hope this helps,
Hugh

----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:11 EDT