Re: [xml] Lower speed with greater xmlParseChunk() chunks?

Date view Thread view Subject view Author view

From: rolf@pointsman.de
Date: Sun Oct 15 2000 - 16:04:47 EDT


('BINARY' encoding is not supported, stored as-is)

On 15 Oct, Wayne Davison wrote:
> On Sun, 15 Oct 2000 rolf@pointsman.de wrote:
>> Playing around I found one very strange behavior. Parsing speed slows
>> (dramatically) down, if the chunks of data are big.
>
> I noticed something similar, but I never tried to quantify it. I believe
> that the problem is that the code likes to keep shifting the buffer as it
> parses it. A while back I removed one such "SHRINK" call (only when the
> data was being pushed), but I seem to recall that it didn't eliminate the
> shifting. I decided to deal with this by just calling the push function
> in a loop with a guaranteed small chunk size and then forgot to bring up
> the subject.

The appended demo program shows the problem (already mailed privately
to Daniel). Choose a CHUNKSIZE at the beginning of the file, compile
it with something like gcc -o demo demo.c -lxml and run it with ./demo
<XML file>. Then choose another CHUNKSIZE and do it again. If you use,
let's say, 1024 for the first try and 102400 for the second you will
see a great difference in executing time with the same input data (at
least for me the later is around 3 times slower than the first.)

> Another problem with the code is that it copies all the data that you push
> into an input buffer, and then translates all that data into another UTF8
> buffer, so you can end up consuming another 2x the size of your push
> buffer in memory.

Or more, in worst case ;-)

> I've thought that the push routine should be modified to only put a small
> chunk of memory at a time into the input buffer, effectively moving my
> buffer-dividing loop into the xml internals.

As I also suggested in my first mail... Please notice, that
xmlSAXUserParseMemory() and maybe some others also slows dramatically
down with a big piece of input. If someone evaluates libxml and feed
in bigger chunks of input at once (I personally have to handle XML
data up to 100 MByte a file), he may get a very bad impression of the
libxml parser speed.

rolf


----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Sun Oct 15 2000 - 16:43:15 EDT