Re: [xml] Whitespace problem with external DTD validation

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Tue Oct 10 2000 - 17:27:12 EDT


On Mon, Oct 09, 2000 at 04:47:08PM -0700, Bill Kendrick wrote:
>
>
> On Mon, 09 Oct 2000 16:35:23 Joe McAlerney wrote:
> >
> > Actually, this is the way it is supposed to work according to W3c XML
> > 1.0 spec. See http://www.w3.org/TR/REC-xml#sec-white-space
> >
> > My guess is that if whitespace is significant in your documents, then
> > you need to state so in your DTD.
>
> They aren't significant... We were assuming they would be ingored
> by the validator. Realize I'm talking about the whitespace BETWEEN
> tags, not within tags.
>
> In other words, the difference between:
>
> <tag1>foo</tag1>
> <tag2>bar</tag2>
>
> and:
>
> <tag1>foo</tag1><tag2>bar</tag2>
>
>
> The validator accepts the latter (no whitespace between tags),
> but complains about the former (whitespace between tags).
>
>
> I have an example XML document (Shakespear's "The Tragedy of
> Antony and Cleopatra") which has a DTD at the top.
>
> I've moved the DTD into it's own file (cleo.dtd) and tried
> running the XML through libxml's validator using that
> external DTD and it just barfs...
>
> Element PLAY content doesn't follow the Dtd
> Expecting (TITLE , PERSONAE , SCNDESCR , PLAYSUBT , INDUCT? , PROLOGUE? ,
> ACT+ , EPILOGUE?), got (CDATA TITLE CDATA PERSONAE CDATA SCNDESCR CDATA
> PLAYSUBT CDATA ACT CDATA ACT CDATA ACT CDATA ACT CDATA ACT CDATA)
> ...
>
>
> I assume if I were to write something that had libxml parse the
> original XML document (which has the DTD internally at the top),
> that it wouldn't complain.
>
> Having a 9044 line Shakespearean play all on one line seems kind of lame. ;)
>
>
>
> > I have never had to do this, so I
> > can't offer any advise on how to do so. I'm sure the spec will help you
> > out. Here are some relevant threads that may help you out too:
> >
> > http://www.xmlsoft.org/messages/0830.html
>
> This one has to do with saving XML. ;)
>
>
> > http://www.xmlsoft.org/messages/0716.html
>
> ... as does this one.
>
>
>
> At this point (I just started working with XML and DTD's last
> Friday ;) ), I don't know much, but I ASSUME either the problem
> is a bug, or there's some "ignore whitespace between tags" flag
> to set somewhere, or something...

  Okay here is the answer ....
What you are doing is
   1/ parsing without a DTD
   2/ trying to validate the result using a DTD.
instead of
   3/ parsing an validating at the same time

 What's happening ???
 Simply that during 1/ when the XML parser see </p> <p>
it can't say for sure whether those blanks are significant or not,
and by default it keeps them as text nodes in the tree. Then in 2/
the parser see extra text nodes and barfs because it doesn't expect
PCDATA there.

 Solutions:
   Quick: use 3/ add
          xmlDoValidityCheckingDefaultValue = 1;
          before calling the parser and the XML parser will automatically
          validate
 Serious: the code doing the validity checking in valid.c should attempt to
          detect those unsignificant white spaces in the tree and avoid
          generating validity errors for them

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Tue Oct 10 2000 - 17:44:38 EDT