RE: [xml] HTML push interface

Date view Thread view Subject view Author view

From: Jordan Henderson (jhenderson@daynt1.daas.dla.mil)
Date: Tue Aug 01 2000 - 17:23:52 EDT


I can't help you, but I would like to point out that there is someone else who
is doing an C++ wrapper for libxml.

See http://lusis.org/~ari/xml++/ for more information.

> -----Original Message-----
> From: Marc Sanfacon [mailto:sanm@copernic.com]
> Sent: Tuesday, August 01, 2000 2:37 PM
> To: 'xml@xmlsoft.org'
> Subject: [xml] HTML push interface
>
>
> Hi there,
> I am new to libxml (I've been using it for less than 1 week). I
> have written a C++ interface on top of it. It is not yet
> finished, but it
> includes most features I need for now. BTW, I am working
> under Windows 2000
> using MSVC 6.0 SP3.
>
> I have tried to parse a file using the html push
> interface and have
> strange results.
>
> Here is the code:
>
> FILE *f = fopen(CGL::ConvertString(p_FileName).c_str(), "r");
> if (f != NULL) {
> int res, size = 4096;
> char chars[4096];
> htmlParserCtxtPtr ctxt;
>
> res = fread(chars, 1, 4, f);
> if (res > 0) {
> ctxt = htmlCreatePushParserCtxt(NULL, NULL,
> chars, res, 0, static_cast<xmlCharEncoding>(0));
> InitContext(ctxt);
> while ((res = fread(chars, 1, size, f)) > 0) {
> htmlParseChunk(ctxt, chars, res, 0);
> }
> htmlParseChunk(ctxt, chars, 0, 1);
> pDoc = ctxt->myDoc;
> htmlFreeParserCtxt(ctxt);
> }
> fclose(f);
> }
>
> This is mainly the code presented in 'testHTML.c' from the
> package, except
> that I use a bigger buffer. In my tests, one strange thing
> happened. When
> using a buffer large enough to fit one of my document, the
> result of the
> parsing is not complete. For now, I have only one document
> that does this
> effect and I have attached it to this email.
>
> For example, the document is 2001 bytes long. When reading
> using fread, it
> strips the '\r' so this gives a total of 1971 bytes. When I
> put 1967 (1971
> - 4 bytes for the header) or more, I get the error, a big
> chunk from my
> document is skipped, but if I put 1966 or less, the document
> is parsed OK.
>
> I even modified 'testHTML.c' to use buffer of 1967 bytes to
> ensure I was OK,
> and I had the same error using: testHTML -debug -repeat -push doc2.htm
>
> Anyone can help me ?
>
> Regards,
>
> Marc.
>
> <<doc2.htm>>
>
> ---------------------------------------------------------------------
> "If you choose not to decide, you still have made a choice."
> Neil Peart
> ---------------------------------------------------------------------
> Marc Sanfacon, Software developer Copernic.com
> e-mail: sanm@copernic.com R&D Group
> Tel : (418) 527-0528 ext 1212 ICQ #7355101
>
>

----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:25 EDT