From: Marc Sanfacon (sanm@copernic.com)
Date: Mon Nov 20 2000 - 13:51:49 EST
Another problem I forgot to mention is the fact that the HTMLParser from
libxml may discard some tag and then we may be unable to fix the document.
For example:
Original document:
<center>
<html><head>
<TITLE>Classifieds</TITLE>
</head><body>
<center>
<html>
<center>
</center><a name=rsearch"></form></BODY></HTML><!-- END PAGE FOOTER
--></center>
Parsed document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org
/TR/REC-html40/loose.dtd">
<html><body><center>
<title>Classifieds</title>
<center>
<center></center>
<a name="rsearch""></a>
</center>
</center></body></html>
<!-- END PAGE FOOTER -->
This is a modified extract from a page that comes from the netscape web
site.
Regards,
Marc.
---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net
This archive was generated by hypermail 2b29 : Mon Nov 20 2000 - 14:44:02 EST