Re: Re[2]: [xml] HTML & XML Parser

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Sun Sep 24 2000 - 17:35:38 EDT


On Sun, Sep 24, 2000 at 10:54:26PM +0200, Manuel Guesdon wrote:
> >| Latest and future versions
> >| of HTML are XML based, hence new extensions will be made using the
> >| XHTML version and in an XML framework. Forget about auto-closed tags
> >| and other SGML minimizations nastyness this does not fit into this
> >| framework anymore.
>
> OK. Good news :-)

  Well it's the expected Web infrastructure evolution, at least as planned
from W3C, I can assert this because 1) I'm a W3C employee 2) iti's public
knowledge though this information should be more widespread.

> My goal was to have a general wrapper around libxml which the minimum
> possible differences between handling html or xml (just a flag, for
> exemple).
> To explain my need, now this wrapper is used in GNUstep for xml parsing
> and should be used in GNUstepWeb for HTML parsing.

  Look at the end of the SAX.c file you will see taht both HTML and XML
default SAx back-end share the same functions. It's just that a number of
XML entities/validity related callbacks are not activated
  just diff htmlDefaultSAXHandler and xmlDefaultSAXHandler ... all the HTML
callbacks are shared with the HTML ones. The differences in processing
within those SAX callbacks is defined by the ctxt->html flag indicating
the kind of document.

> >| > My SAX functions use parser context _private member) ?
> >|
> >| I don't understand. _private are fields located in the DOM generated
> >| tree structures. And you say you use SAX ... So I'm lost what are you doing
> >| there, what API do you use exactly ?
>
> I use SAX because I want to handle some things (error messages, warning
> and Entity Loading) and haven't found a better way. So I create a SAX
> handler to put my own functions for these things. Others functions arei
> the 'standards' ones (HTML or XML parser functions).

  Okay this is a "normal" supported use of libxml !
For the entities loading I would rather suggest using
xmlSetExternalEntityLoader() and not modifying the SAX callback.

> My SAX handling functions use _private member of parser context,
> so if xml parser context and html parser context are not the same
> structures, I'll get into trouble because the structure offset ofi
> this member may won't be the same (or I have to write two sets of functions).

  They are the same structures and I intend to keep them this way

> If you tell me that the html and xml parser structures will be the
> same for future versions of libxml and if I can call htmlParseDocument
> with a parser context created with xmlCreateMemoryParserCtxt(),
> I'll be happy :-)
> In this case I just have to put a flag in the wrapper and call
> htmlParseDocument() or xmlParseDocument() depending on this flag.

  1/ I intent to keep them the same
  2/ I should add htmlCreateMemoryParserCtxt() because while the HTML
     set of callback is a subset, this may change in the future, that
     I cannot garantee, so i would prefer a separate routine
  3/ use ctxt->html to know which parser is in use.

  hope this fullfill your needs

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Sep 24 2000 - 17:43:14 EDT