Re: [xml] A truncation bug and some testHTML.c enhancements

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Sun Aug 13 2000 - 10:33:22 EDT


On Sun, Aug 13, 2000 at 12:41:40AM -0700, Wayne Davison wrote:
> There's a bug in the HTML parser when we're using the push interface
> and we encounter a meta tag that changes the charset. After the code
> shrinks the input buffer to remove all the already-parsed characters,
> it then calls xmlCharEncFirstLine(), which only converts 45 characters
> or less from the new raw buffer. Any characters in the raw buffer
> after this are never parsed. Depending on the buffer size, this can
> truncate the entire HTML file in the middle of the HEAD section.
[...]
> All changes are based on the CVS source I just grabbed from gnome.org.

  Quick question: before or after the 2.2.2 release (i.e. yesterday
afternoon) ? Because I think I fixed this bug (but in the W3C CVS)
a few days ago and it got copied oved in the gnome CVS only yesterday
afternoon.
  BTW I did also added SAX/push testing to testHTML.c and added those to
make test yesterday.

Content-Description: My 1-line change to parser.c
> Index: parser.c
> ===================================================================
> RCS file: /cvs/gnome/gnome-xml/parser.c,v
> retrieving revision 1.81
> diff -u -r1.81 parser.c
> --- parser.c 2000/07/21 20:32:03 1.81
> +++ parser.c 2000/08/13 07:16:07
> @@ -2543,7 +2543,7 @@
> * parsed with the autodetected encoding
> * into the parser reading buffer.
> */
> - nbchars = xmlCharEncFirstLine(ctxt->input->buf->encoder,
> + nbchars = xmlCharEncInFunc(ctxt->input->buf->encoder,
> ctxt->input->buf->buffer,
> ctxt->input->buf->raw);

  I'm afraid it is not completely clean.

> if (nbchars < 0) {

Content-Description: My improvements for testHTML.c
> Index: testHTML.c
> ===================================================================
> RCS file: /cvs/gnome/gnome-xml/testHTML.c,v
> retrieving revision 1.12
> diff -u -r1.12 testHTML.c
> --- testHTML.c 2000/07/14 14:49:22 1.12
> +++ testHTML.c 2000/08/13 07:15:35
> @@ -49,6 +49,7 @@
> static int repeat = 0;
> static int noout = 0;
> static int push = 0;
> +static int bigpush = 0;
[...]
> + printf("\t--bigpush : like --push, but use a big buffer\n");

  Ok maybe bigpush should be the default and adding --smallpush
is the right option.

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Aug 13 2000 - 09:43:14 EDT