Re[2]: [xml] HTML & XML Parser

Date view Thread view Subject view Author view

From: Manuel Guesdon (mguesdon@oxymium.net)
Date: Sun Sep 24 2000 - 16:54:26 EDT


Thanks for the help, Daniel.

On Sun, 24 Sep 2000 21:53:16 +0200 Daniel Veillard wrote:
>| On Thu, Sep 21, 2000 at 03:40:24AM -0400, Manuel Guesdon wrote:
>| > I'd like to parse HTML file with xml parser so people can change
>| > DTDs (adding tags,...) without having to re-compile libxml.
>| > The main problem is "Auto Closed" tag.
>|
>| Have you looked at the HTML parser in libxml ? did you noticed
>| it produces a similar tree as say an equivalent XHTML document would have
>| produced when parsed with the XML parser ? The HTmL parser of course handle
>| autoclosed (and auto-opened in some mesure) tags.

I've seen this. This is one of the point which made me think I can select wich parser I want at run tine.

>| Latest and future versions
>| of HTML are XML based, hence new extensions will be made using the
>| XHTML version and in an XML framework. Forget about auto-closed tags
>| and other SGML minimizations nastyness this does not fit into this
>| framework anymore.

OK. Good news :-)

>| > So I'd like to know some things:
>| > Can I safetely mix html and xml parser functions (i.e. construct
>| > a context with xmlCreateMemoryParserCtxt() and parse the doc with
>| > htmlParseDocument.
>|
>| Can you tell me what you're aiming at this way ? I don't understand
>| your approach. If it's an HTML document use the HTML parser, otherwise
>| use the XML parser.

My goal was to have a general wrapper around libxml which the minimum possible differences between handling html or xml (just a flag, for exemple).
To explain my need, now this wrapper is used in GNUstep for xml parsing and should be used in GNUstepWeb for HTML parsing.

>| > My SAX functions use parser context _private member) ?
>|
>| I don't understand. _private are fields located in the DOM generated
>| tree structures. And you say you use SAX ... So I'm lost what are you doing
>| there, what API do you use exactly ?

I use SAX because I want to handle some things (error messages, warning and Entity Loading) and haven't found a better way. So I create a SAX handler to put my own functions for these things. Others functions are the 'standards' ones (HTML or XML parser functions).
My SAX handling functions use _private member of parser context, so if xml parser context and html parser context are not the same structures, I'll get into trouble because the structure offset of this member may won't be the same (or I have to write two sets of functions).

If you tell me that the html and xml parser structures will be the same for future versions of libxml and if I can call htmlParseDocument with a parser context created with xmlCreateMemoryParserCtxt(), I'll be happy :-)
In this case I just have to put a flag in the wrapper and call htmlParseDocument() or xmlParseDocument() depending on this flag.

Manuel

--

---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Sep 24 2000 - 17:43:14 EDT