From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Fri Nov 17 2000 - 10:56:16 EST
On Fri, Nov 17, 2000 at 09:57:13AM -0500, Marc Sanfacon wrote:
> Hi again,
> I found another problem with the HTML parser. I know, that this is
> non-valid HTML, but... Here is the case:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD><TITLE>Title</TITLE>
> <META http-equiv=Content-Type content="text/html; charset=windows-1252">
> </HEAD>
> <BODY>
> So <A href="http://www.ebay.com/">eBay® Company</A>
> </BODY></HTML>
>
> As you can see, the line that contains the href contains the following
> '®' and it doesn't end with a ';'. So the result from libxml is:
I have made a slightly different fix generating the following:
SAX.startElement(a, href='http://www.ebay.com/')
SAX.characters(eBay, 4)
SAX.error: htmlParseCharRef: invalid decimal value
SAX.characters( Company, 8)
SAX.endElement(a)
thanks for the report,
Daniel
-- Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes | libxml Gnome XML toolkit Tel : +33 476 615 257 | 655, avenue de l'Europe | http://xmlsoft.org/ Fax : +33 476 615 207 | 38330 Montbonnot FRANCE | Rpmfind search site http://www.w3.org/People/all#veillard%40w3.org | http://rpmfind.net/
---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net
This archive was generated by hypermail 2b29 : Fri Nov 17 2000 - 11:47:19 EST