Re: [xml] Saving without tree

Date view Thread view Subject view Author view

From: mdf@angoss.com
Date: Mon May 29 2000 - 19:43:29 EDT


> Of course libxml is both a generator and a parser ... But to be able
> to generate it *needs* to understand the in-memory storage format.
> If you decide to not use the libxml tree format, for whatever reason,
> I don't see how libxml could "magically" understand your encoding of
> the data. Try to think 2 minutes about this there is no obvious
> solution for this !

I've actually thought about this so much that I actually *have a working
generator* that needs no "in-memory storage format", beyond whatever
tree/list structure that is implicit in the data itself.

Rather than defining rigid, stifling, data structures, you just play
around with some function and object pointers instead. {ie, rigid,
stifling, interfaces.} Naturally, this is alot cleaner in a so-called
"civilized" languages like C++, but it can be done easily [though
tediously] enough in C as well.

Add in some helpers for conversion to and from core data types like
strings, deals with the standard entity stuff, numbers and you have
something which can deal with about 90% of the "work" one is likely to
do with XML. [Well, the work *I've* done at least..]

The result is effectively an inverse of a SAX parser.

> > Failing that, allowing one to use the streams libXML reads from in an
> > output mode would be nice [thus one gets compression 'for free'].
>
> Parse error. I cannot understand this sentence...

Translation: expose a "stream" thingee which has nice, easy to use functions
a la fread, fwrite(), but can do the compression dance if necessary.

I am aware this is probably well beyond the purview of an XML parser though,
but I suspect a parser could make excellent use of such a thing. Example:

> May I suggest people actaully *look* at what is available before starting
> suggesting modifying/extending the library ? All the output routines are
> available in tree.c.

Inside tree.c is a bunch of "buffer" stuff. Now I haven't used it myself,
but reading around in the code, it looks like you hand someone a complete
document [in the "in-memory storage format"] and it proceeds to
generate the XML [optionally compressed?] into another in-memory buffer,
and this buffer is finally slopped to a file.

If this understanding is correct, the primary issue is the usual one:
memory consumption. Namely, one will have the source data in-memory
[unavoidable [*]], an in-memory document tree, and, the in-memory XML
equivalent of this tree.

For piddly small documents like web-pages and the like, this is probably
bearable. But for multi-megabyte monsters, it would be better if the
memory footprint for a simple "save" operation be nominal to non-existant.

This is probably easiest done when one has a nifty stream gizmo into
which one just dribbles the XML straight out of the application's
objects, and this eventually makes it to the disk (or socket connection
or whereever). Net memory hit is independent of document size, and might
be a mighty 4k if one is feeling decadent the day the code is written... ;-)

[*] In the application area I am interested in, there is really a continuous
stream of source data. So even though the total amount of data is, in
principle at least, unbounded, the actual data being dealt with at any
time is O(100 bytes).

> I also strongly suggest that if your interested going
> that deep in the technical details of the library, then your should
> use the CVS tree to see what is really there and not an ancient version.

I am looking at, and using, libxml2.0.0.

----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:13 EDT