Re: [xml] Still validating while using SAX Interface?

Date view Thread view Subject view Author view

From: rolf@pointsman.de
Date: Sun Oct 15 2000 - 18:39:09 EDT


On 15 Oct, Daniel Veillard wrote:
> On Sun, Oct 15, 2000 at 09:29:54PM +0200, rolf@pointsman.de wrote:
>> Maybe I've overseen something obvious but as far as I see there are
>> two ways to use libxml:
>>
>> o Let libxml build a DOM tree in memory. This allows to enable
>> validation, if needed.
>>
>> o Use the SAX Interface (with an own xmlSAXHandlerPtr). Using this API
>> there is now way to use the validating facilities build in into the
>> libxml.
>
> right
>
>> As pointed out within the documentation, the first way uses internaly
>> also the SAX Interface and uses a special set of SAX handler
>> funktions (xmlDefaultSAXHandler). This special SAX handlers are located
>> in the file SAXvalidation.c
>
> no in SAX.c
>
>> If validation is enabled, this special handler funktions do two things
>> at once: they validate the document and they build the DOM tree.
>
> or rather build the DOM tree and then uses it for validation
>
>> Well, I want to use the SAX Interface while still validating. In
>
> I don't know how to do this:
> - SAX basically operate on constant memory use
> - the amount of memory needed for validating is not constant
> So you have to keep informations ... I use DOM.
>
>> theory, this should not be to difficult. There should be another set
>
> It is, thing about entities references !

I'm sorry to insist.. I don't see SAX versus DOM along the lines of
constant memory use versus variable memory usage. SAX let me do
whatever I want with the XML data without the need of a memory hungry
DOM tree.

Don't get me wrong. DOM is an easy to use way to represent XML data in
memory, but have his limitations. I have to handle XML files up to
100 MByte and more (XML Productcatalogs). It isn't an option for me to
donate 1 GByte of memory just to be able to read the data (libxml DOM
trees are big...).

It's true, a validating SAX parser may need some variable memory, not
only to store entities but of course to store the hole structure
information out of the DTD. But that's typically much smaller memory
requirements than that for an hole DOM tree.

>> of default handlers, that does only the validation stuff and then,
>> instead of building the libxml DOM tree, call my registered handler
>> funktions.
>
> No the validation need state stored in the DOM

Please could you be a bit more elaborated about what informations
stored in the DOM tree are needed for validation?

>> I've scanned through the code in SAXvalidation.c and it seems, the
>
> SAXvalidation.c does not exist in my tree ... are you sure you're
> talking about libxml ???
> The validation basically uses the file valid.c

Your right, sorry for that. I've messed around a copy of libxml2-2.2.4
with a very first fast and hacky attempt to remove the DOM building
parts out of the default handler functions (they are indeed in
valid.c). I had to realize, that it isn't a task of only one or two
hours, to understand all the internal bells and whistles. Therefor I
decided to ask the "gurus", if it's worth a more serious attempt.

rolf

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Oct 15 2000 - 18:43:18 EDT