Re: [xml] Incremental output (kinda followup to: streaming output)

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Fri Feb 18 2000 - 11:25:42 EST


On Fri, Feb 18, 2000 at 04:56:14PM +0100, Lutz Behnke wrote:
> Hi there,
>
> I want to use libxml in an application that sends data over a http connection.
>
> David, I have read your answer to Havocs question, stating that you would not
> use such large trees, but rather either do it all in one streaming go, or split
> the messages.
>
> My problem is a little bit different: while each individual document is small
> enough to write to memory w/o using all VM, there are some issues:
> a) I have to watch the memory footprint of each message and its environment as
> this is a server application and I need to be able to process more than one
> message in parrallel.
> b) As I have to write the outgoing data to the network, I costs a lot of overhead
> to write data to a file first and then pump the file back into the socket.
> And as this is somewhat security sensitive I also have to watch temp files
> in general (lets say that I was brainwashed to simply dislike temp files).
> c) If I dump the tree to memory I allmost double the mem footprint instantly.
> I would really like to avoid this.
> d) generating the tree for outgoing messages from the tree of incomming messages
> makes some sense as there are rather complex operations on both before I write
> all of it back to the net.
>
> I would assume that there are some kind of iterators to go over the tree in order
> to dump it to mem or to a file. would it not be possible to adapt them to a producer-
> consumer style dumper that stop when a number of bytes are written?
> I would like to write it, but unfortunately am I only stalking around the lib trying
> to understand how to use it, let alone how to extend it.

 Honnestly if you have memory constraints I suggest using the SAX interface.
You don't get a tree but a list of callback get called as the input is processed
I also suggest that you use the Push mechanism where data is provided chunk by
chunk, it would allow you to check that you're not near the memory limit before
processing some input:

    http://xmlsoft.org/#Invoking2
    "Invoking the parser: the push way" and following section
    "Invoking the parser: the SAX interface"
  
  If you really need the tree, and are concerned about memory consumption
I have a bad news, the tree memory structure will grow up somewhat to be able
to handle DOM in a better way. Attribute structure will be larger in 2.0 .

  c.f: http://bugs.gnome.org/db/51/5190.html

  On the other hand concerning serialization of the result it seems not too
difficult to tweak the buffer code so that it does flushing while being
created (and add possible encoding conversion like the parser input).
Would an interface accepting a file descriptor or a FILE * suitable ?
Or another callback on buffer full interface would be better ?

  I can do binary incompatible changes now since I'm preparing version
2.0 so if there is such changes to be done I would rather get the list now.

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:04 EDT