Re: [xml] HTMLParser from libxml concerns.

Date view Thread view Subject view Author view

From: TOM (ptittom@free.fr)
Date: Fri Oct 20 2000 - 17:24:54 EDT


On 20/10/2000 14:57:57 Marc Sanfacon wrote:
> I am willing to modify the HTMLParser so that it uses the same
rules
> than 'HTMLTidy' (http://www.w3.org/People/Raggett/tidy/) to fix a HTML
page.
> I do not want to create a second HTMLTidy, so I won't put all features
in
> it, but the rules used to fix the HTML.
[...]
> * Is there anyone else already doing this job ?

I wanted to rewrite HTML Tidy in making it a library (so you could use
it as a base for a HTML browser), then eventually modify it to use the
libxml DOM tree (like JTidy, the Java port of HTMLTidy, do, using
org.w3.dom).
I don't have enough time... :o(

I think you could use the SAX callbacks to perform the cleaning while
parsing. Tidying a DOM tree would also be necessary if you want to clean an
invalid (but wellformed) HTML document.

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Fri Oct 20 2000 - 17:43:29 EDT