[xml] HTML push interface

Date view Thread view Subject view Author view

From: Marc Sanfacon (sanm@copernic.com)
Date: Tue Aug 01 2000 - 14:36:30 EDT


Hi there,
        I am new to libxml (I've been using it for less than 1 week). I
have written a C++ interface on top of it. It is not yet finished, but it
includes most features I need for now. BTW, I am working under Windows 2000
using MSVC 6.0 SP3.

        I have tried to parse a file using the html push interface and have
strange results.

Here is the code:

FILE *f = fopen(CGL::ConvertString(p_FileName).c_str(), "r");
if (f != NULL) {
    int res, size = 4096;
    char chars[4096];
    htmlParserCtxtPtr ctxt;

    res = fread(chars, 1, 4, f);
    if (res > 0) {
        ctxt = htmlCreatePushParserCtxt(NULL, NULL,
                        chars, res, 0, static_cast<xmlCharEncoding>(0));
        InitContext(ctxt);
        while ((res = fread(chars, 1, size, f)) > 0) {
            htmlParseChunk(ctxt, chars, res, 0);
        }
        htmlParseChunk(ctxt, chars, 0, 1);
        pDoc = ctxt->myDoc;
        htmlFreeParserCtxt(ctxt);
    }
    fclose(f);
}

This is mainly the code presented in 'testHTML.c' from the package, except
that I use a bigger buffer. In my tests, one strange thing happened. When
using a buffer large enough to fit one of my document, the result of the
parsing is not complete. For now, I have only one document that does this
effect and I have attached it to this email.

For example, the document is 2001 bytes long. When reading using fread, it
strips the '\r' so this gives a total of 1971 bytes. When I put 1967 (1971
- 4 bytes for the header) or more, I get the error, a big chunk from my
document is skipped, but if I put 1966 or less, the document is parsed OK.

I even modified 'testHTML.c' to use buffer of 1967 bytes to ensure I was OK,
and I had the same error using: testHTML -debug -repeat -push doc2.htm

Anyone can help me ?

Regards,

Marc.

 <<doc2.htm>>

---------------------------------------------------------------------
 "If you choose not to decide, you still have made a choice."
                                Neil Peart
---------------------------------------------------------------------
Marc Sanfacon, Software developer Copernic.com
e-mail: sanm@copernic.com R&D Group
Tel : (418) 527-0528 ext 1212 ICQ #7355101


----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:24 EDT