Re: [xml] htmlParserInputRead()

Date view Thread view Subject view Author view

From: Wayne Davison (wayned@blorf.net)
Date: Tue Aug 22 2000 - 19:28:50 EDT


On Tue, 22 Aug 2000, Daniel Veillard wrote:
> Hum, this really changes the semantic of the function, the goal
> is not only to read data but also to shrink the buffer use by
> discarding scanned characters.

Right, but we don't want to do too much shifting of the data, or we
end up wasting a lot of time moving characters around. For instance,
I used to have some code that would pass whatever buffer size I got
from libwww on to htmlParseChunk(), which meant that I was sending up
to about 32K to the function. Having the entire buffer get rewritten
every time we parse ~150 bytes just seems wasteful to me. (Note that
I now have a loop that breaks up the htmlParseChunk() calls into
smaller chunks, since that appears to be more efficient. Maybe the
internal code should be doing something like that -- rationing out a
large push of data into smaller chunks so we don't end up with a bunch
of large internal buffers.)

One thing I did make sure of was that, even with my change to
htmlParserInputRead(), the "buffer" and "raw" objects don't grow to
hold the entire document when pushing smaller chunks of HTML data.

..wayne..

----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Tue Aug 22 2000 - 16:43:12 EDT