RE: [xml] HTML push interface

Date view Thread view Subject view Author view

From: Marc Sanfacon (sanm@copernic.com)
Date: Wed Aug 02 2000 - 09:46:32 EDT


Hi Daniel,
        I have applied the fix on HTMLParser.c on my version libxml2-2.2.1.
I am on using the latest version found on the cvs archive, so may be I
should.

        But after applying your patch, I now have another problem, that
appears using both interfaces. When using the push interface, I have an
infinite loop and when using the 'file' interface, I have invalid results.
I have pinpointed the problem on the lines that were added at 2480 in
HTMLParser.c:

        if ((ctxt->input->buf != NULL) &&
           (ctxt->input->buf->encoder != NULL) &&
           (ctxt->input->buf->raw != NULL) &&
           (ctxt->input->buf->buffer != NULL)) {
           int nbchars;

           /*
            * convert as much as possible to the parser reading buffer.
            */
           nbchars = xmlCharEncInFunc(ctxt->input->buf->encoder,
                                      ctxt->input->buf->buffer,
                                      ctxt->input->buf->raw);
           if (nbchars < 0) {
               if ((ctxt->sax != NULL) && (ctxt->sax->error != NULL))
                   ctxt->sax->error(ctxt->userData,
                    "htmlCheckEncoding: encoder error\n");
               ctxt->errNo = XML_ERR_INVALID_ENCODING;
           }
        }

Again, I have tried my test files on it and the problem occurs on one file,
which is attached to this email.

I will continue to dig and learn the toolkit, and may try to repair the
problem. I will post my results on the list.

Regards,
        Marc.

-----Original Message-----
From: xml-request@rufus.w3.org [mailto:xml-request@rufus.w3.org]On
Behalf Of Daniel Veillard
Sent: August 1, 2000 19:45
To: xml@rpmfind.net
Subject: Re: [xml] HTML push interface

On Tue, Aug 01, 2000 at 06:03:02PM -0400, Daniel Veillard wrote:
> > For example, the document is 2001 bytes long. When reading using fread,
it
> > strips the '\r' so this gives a total of 1971 bytes. When I put 1967
(1971
> > - 4 bytes for the header) or more, I get the error, a big chunk from my
> > document is skipped, but if I put 1966 or less, the document is parsed
OK.
> >
> > I even modified 'testHTML.c' to use buffer of 1967 bytes to ensure I was
OK,
> > and I had the same error using: testHTML -debug -repeat -push doc2.htm
>
> However your document raises the same problem on my environment
> so I will have a look at it and try to pinpoint and fix the problem.

  Okay, I found the problem, fixed it (wasn't really trivial :-\)
and added your html in the testsuite. It's commited in W3C cvs base:

   http://dev.w3.org/cvsweb/XML/HTMLparser.c.diff?r1=1.53&r2=1.54

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:25 EDT