From: Marc Sanfacon (sanm@copernic.com)
Date: Wed Aug 02 2000 - 09:46:32 EDT
Hi Daniel,
I have applied the fix on HTMLParser.c on my version libxml2-2.2.1.
I am on using the latest version found on the cvs archive, so may be I
should.
But after applying your patch, I now have another problem, that
appears using both interfaces. When using the push interface, I have an
infinite loop and when using the 'file' interface, I have invalid results.
I have pinpointed the problem on the lines that were added at 2480 in
HTMLParser.c:
if ((ctxt->input->buf != NULL) &&
(ctxt->input->buf->encoder != NULL) &&
(ctxt->input->buf->raw != NULL) &&
(ctxt->input->buf->buffer != NULL)) {
int nbchars;
/*
* convert as much as possible to the parser reading buffer.
*/
nbchars = xmlCharEncInFunc(ctxt->input->buf->encoder,
ctxt->input->buf->buffer,
ctxt->input->buf->raw);
if (nbchars < 0) {
if ((ctxt->sax != NULL) && (ctxt->sax->error != NULL))
ctxt->sax->error(ctxt->userData,
"htmlCheckEncoding: encoder error\n");
ctxt->errNo = XML_ERR_INVALID_ENCODING;
}
}
Again, I have tried my test files on it and the problem occurs on one file,
which is attached to this email.
I will continue to dig and learn the toolkit, and may try to repair the
problem. I will post my results on the list.
Regards,
Marc.
-----Original Message-----
From: xml-request@rufus.w3.org [mailto:xml-request@rufus.w3.org]On
Behalf Of Daniel Veillard
Sent: August 1, 2000 19:45
To: xml@rpmfind.net
Subject: Re: [xml] HTML push interface
On Tue, Aug 01, 2000 at 06:03:02PM -0400, Daniel Veillard wrote:
> > For example, the document is 2001 bytes long. When reading using fread,
it
> > strips the '\r' so this gives a total of 1971 bytes. When I put 1967
(1971
> > - 4 bytes for the header) or more, I get the error, a big chunk from my
> > document is skipped, but if I put 1966 or less, the document is parsed
OK.
> >
> > I even modified 'testHTML.c' to use buffer of 1967 bytes to ensure I was
OK,
> > and I had the same error using: testHTML -debug -repeat -push doc2.htm
>
> However your document raises the same problem on my environment
> so I will have a look at it and try to pinpoint and fix the problem.
Okay, I found the problem, fixed it (wasn't really trivial :-\)
and added your html in the testsuite. It's commited in W3C cvs base:
http://dev.w3.org/cvsweb/XML/HTMLparser.c.diff?r1=1.53&r2=1.54
Daniel
-- Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes | Today's Bookmarks : Tel : +33 476 615 257 | 655, avenue de l'Europe | Linux XML libxml WWW Fax : +33 476 615 207 | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind http://www.w3.org/People/all#veillard%40w3.org | RPM badminton Kaffe ---- Message from the list xml@xmlsoft.org Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@xmlsoft.org
---- Message from the list xml@xmlsoft.org Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@xmlsoft.org
This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:25 EDT