Re: [xml] Auto-close in HTML parser.

Date view Thread view Subject view Author view

From: TOM (ptittom@free.fr)
Date: Mon Oct 23 2000 - 12:29:25 EDT


On 23/10/2000 16:59:36 Marc Sanfacon wrote:
> <SELECT NAME=frmOS class="BodyText"
style="width:250px;"><OPTION
> VALUE="98" Selected>Windows 98 &nbsp; <OPTION VALUE="95" >Windows 95
&nbsp;
> <OPTION VALUE="NT" >Windows NT &nbsp;
> </SELECT>
[...]
> when ran through libxml, the nodes look like this: [...]
> <td><select name="frmOS" class="BodyText"
> style="width:250px;"><option
> value="98" selected>Windows 98 &nbsp; <option value="95">Windows 95
&nbsp;
> <option value="NT">Windows NT &nbsp; </option>
> </option>
> </option></select></td>
[...]
> As you can see, the /option is given only when the /select is
> found. I would have like to get the /option when the other
> <option> was found. The problem is when I parse files that has a
> lot of <option> without the closing </option>....
>
> I fixed the problem by adding:
> "option", "option", NULL,
> in htmlStartClose table. I don't know if it is OK.

No, since <option> isn't an empty element but an element whose closing
tag is ignorable.
Doing this (considere <option> an empty element) you break up parsing
when you have the closing tag : <option>My option</option>

The problem is rather in the SAX callbacks which considere an <option>
can have an <option> child (I don't know the HTML implementation enough to
affirm anything here but libxml considere your <option>s are children of
the precedent ones [xmllint --debug --html wil prouve you]).
I believe you should convert to quite well-formed XML (XHTML), i.e.
don't have any ignored close tag, until the libxml HTML parser is fixed
(Daniel, there must be something to borrow to HTML Tidy here ;o).

Perhaps if you add a DOCTYPE declaration and validate at parsing it will
do the trick...

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 14:43:56 EDT