Re: [xml] Lower speed with greater xmlParseChunk() chunks?

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Sun Oct 15 2000 - 04:40:56 EDT


  Hi Rolf,

On Sun, Oct 15, 2000 at 03:23:05AM +0200, rolf@pointsman.de wrote:
>
>
> I'm using the libxml SAX Interface (without validation). I'm doing
> things very close to the way shown in SAXtest.c.
>
> More detailed I'm using xmlCreatePushParserCtxt() to create the parser
> (and feed in the first 4 byte of the input, as mentioned in the
> documentation and shown in SAXtest.c).
>
> Then I use xmlParseChunk to feed in the rest of my XML data chunk by
> chunk. Everything seems to work very well.
>
> Playing around I found one very strange behavior. Parsing speed slows
> (dramatically) down, if the chunks of data are big.
>
> Parsing always the same medium sized XML Data (around 11 Mbyte) I got:
>
> Chunk Size time
> 1 MB 360s (!)
> 100 kB 28s
> 1 kB 12,8s
>
> >From test to test, I changed nothing, but the buffer size within the
> parsing loop. It seems, optimum is around 1kB. chunk of 512 Bytes are
> as fast as 1 kByte Chunks, 128 Byte chunks are slightly slower. Memory
> consumption seems to be equal independent from chunk size. This all at
> linux 2.2.13 with libc 2.1.2-24, egcs-2.91 and libxml 2-2.2.4.

   Thanks for the detailed bug report !

> Of course, nobody reads a file in 1 MByte chunks. I discovered this
> behavior while parsing already in memory XML-Data coming from
> elsewhere. In this situation is seems to be the most easiest way, to
> feed the hole XML string into the parser with one xmlParseChunk()
> call.

   Hum .... interesting, this sounds like a funny bug especially
when using only the SAX interface ! I fixed a similar serious
slowdown around 2.1.x when using DOM and very large pieces of
content data.

> Is somebody able to reproduce this behavior?

  Well it would help me a lot if you could do the following:
    - compile with -pg -g
    - do the same 3 runs and use gprof to extract profiling
      informations for each run
    - send me (privately unless someone else is interested
      in debugging it).
If needed I will do this too ...

> It's easy to use a small chunk size even for in memory XML Data, of
> course. But at least a short hint within the documentation would be
> helpful, if this all is true (and not a fault of me). Maybe best would
> be, if the parsing engine would spilt up the input into the
> comfortablest chunks automatically. (xmlSAXUserParseMemory() doesn't
> seems to do this.)

  I think the bug should rather be located and fixed !

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Oct 15 2000 - 04:45:37 EDT