Re: [xml] Another HTML parser issue...

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Fri Nov 17 2000 - 10:56:16 EST


On Fri, Nov 17, 2000 at 09:57:13AM -0500, Marc Sanfacon wrote:
> Hi again,
> I found another problem with the HTML parser. I know, that this is
> non-valid HTML, but... Here is the case:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD><TITLE>Title</TITLE>
> <META http-equiv=Content-Type content="text/html; charset=windows-1252">
> </HEAD>
> <BODY>
> So <A href="http://www.ebay.com/">eBay&#174 Company</A>
> </BODY></HTML>
>
> As you can see, the line that contains the href contains the following
> '&#174' and it doesn't end with a ';'. So the result from libxml is:

  I have made a slightly different fix generating the following:

SAX.startElement(a, href='http://www.ebay.com/')
SAX.characters(eBay, 4)
SAX.error: htmlParseCharRef: invalid decimal value
SAX.characters( Company, 8)
SAX.endElement(a)

  thanks for the report,

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | libxml Gnome XML toolkit
Tel : +33 476 615 257  | 655, avenue de l'Europe | http://xmlsoft.org/
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Rpmfind search site
 http://www.w3.org/People/all#veillard%40w3.org  | http://rpmfind.net/


----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Fri Nov 17 2000 - 11:47:19 EST