RE: [xml] HTMLParser bug...

Date view Thread view Subject view Author view

From: Marc Sanfacon (sanm@copernic.com)
Date: Mon Nov 20 2000 - 13:51:49 EST


Another problem I forgot to mention is the fact that the HTMLParser from
libxml may discard some tag and then we may be unable to fix the document.
For example:

Original document:

<center>
<html><head>
<TITLE>Classifieds</TITLE>
</head><body>
<center>
<html>
<center>
</center><a name=rsearch"></form></BODY></HTML><!-- END PAGE FOOTER
--></center>

Parsed document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org
/TR/REC-html40/loose.dtd">
<html><body><center>
<title>Classifieds</title>
<center>
<center></center>
<a name="rsearch&quot;"></a>
</center>
</center></body></html>
<!-- END PAGE FOOTER -->

This is a modified extract from a page that comes from the netscape web
site.

Regards,
        Marc.

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Nov 20 2000 - 14:44:02 EST