[xml] Another encoder truncation bug

Date view Thread view Subject view Author view

From: Wayne Davison (wayned@blorf.net)
Date: Tue Aug 22 2000 - 16:32:08 EDT


I found another case where the push code can truncate the HTML input.
If the input file has a high-bit character in it (e.g. 0xA0 = nbsp)
but there is currently no encoding, the input is assumed to be
ISO-8859-1 and the first line is decoded (about 40 chars or so).
However, after these characters get parsed, the htmlParseChunk() call
returns without processing the rest of the raw buffer. If the very
next call is a flush, all the remaining (raw) data is lost. I've
attached a simple html file that will cause "testHTML --sax --push" to
fail.

I whipped up a solution that works for me -- when the user flushes the
buffer, make sure that we've encoded all of "raw" into "buffer" before
the call to htmlParseTryOrFinish(). A better solution might be to
ensure that the characters get processed before returning from the
htmlParseChunk() call so that there isn't such a potential for delayed
handling.

My quick fix is as follows:

Index: HTMLparser.c
@@ -4220,8 +4220,10 @@
 
         if ((terminate) || (ctxt->input->buf->buffer->use > 80))
             htmlParseTryOrFinish(ctxt, terminate);
- } else if (ctxt->instate != XML_PARSER_EOF)
+ } else if (ctxt->instate != XML_PARSER_EOF) {
+ xmlParserInputBufferPush(ctxt->input->buf, 0, "");
         htmlParseTryOrFinish(ctxt, terminate);
+ }
     if (terminate) {
         if ((ctxt->instate != XML_PARSER_EOF) &&
             (ctxt->instate != XML_PARSER_EPILOG) &&

Also, I'm curious why the htmlParserInputRead() function goes to the
trouble of shifting a buffer of pushed data since it can't read any
new data into the buffer. Adding the following check makes the
function return without doing anything if there is no readcallback
defined:

Index: parser.c
@@ -443,6 +443,7 @@
     if (in->base == NULL) return(-1);
     if (in->cur == NULL) return(-1);
     if (in->buf->buffer == NULL) return(-1);
+ if (in->buf->readcallback == NULL) return(-1);
 
     CHECK_BUFFER(in);
 
..wayne..


----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Tue Aug 22 2000 - 13:43:11 EDT