Re: [xml] Glitch with external entity files; patch enclosed

Date view Thread view Subject view Author view

From: Kenneth Pronovici (pronovic@skyjammer.com)
Date: Wed Nov 08 2000 - 19:47:36 EST


> There will be a call to the entity loader to fetch DTD with the
> URI computed from:
> - the URL of DOC
> - the URI-Reference for DTD stored in DOC

Your explanation matches with what I expected.

Just to clarify, so there isn't any confusion on my part or anyone
else's: in my case the document is just a string in memory, and
the reference to the DTD in the document is:

   <!DOCTYPE test_doc SYSTEM "complex.dtd">

> When the URI-Reference to ENT is found in DTD, there will be a call
> to the entity loader to fetch ENT with the URI computed from:
> - the URL of DTD as found in the result returned by the previous
> invocation to get DTD
> - the URI-Reference for ENT stored in DTD
>
> So yes there is 2 call to the entity loader, and yes it's normal,
> I assume you agree.

Yes, I agree that there will be a call to the entity loader for each
external entity which must be used by libxml. If I my DTD includes
two entity files, then there will be three calls. This matches with
my expectations, and matches my observations prior to submitting the
patch.

> Now you seem surprized because the URI computed for ENT by libxml
> is actually the right one (i.e. with the extra path already added).
> Is that still right ?

Yes, that surprised me. I was expecting libxml to use the entity loader
to resolve the "right" URI for the entities. When I submitted my patch,
I did not understand why libxml would be calling my entity loader if it
already knew how to find the entity file in question. For example, if
libxml already knew to find the file complex.ent as /dvl/fds/dtd/complex.ent,
why would it call the entity loader at all? What additional information
is the entity loader supposed to provide, in that case?

> If yes, this is simply because libxml store more information
> than you expected the simple hack in you entity loaded is to try
> top call the existing default one and only add the prefix if this
> failed.

In pseudocode:

   new_entity_loader()
   {
      result = default_loader()
      if(result == NULL)
      {
         tack "/dvl/fds/dtd" on front
         return xmlNewInputFromFile() for the new path
      }
      else
      {
         return result
      }
   }

I have tested this, and it works fine other than the error message
I will get each time my entity loader is given my DTDs (since the
default entity loader will never be able to resolve my DTDs).

My other option is to make sure that I only add the path to the
front if it's not already there. This isn't any more difficult
than the solution described above, and the net result is the
same. The code will work.

> But you will have to resort to an hack unless you keep a
> catalog i.e. an association between not the URL but the PUBLIC
> identifier of the DTD and it's actual path.

I'm not sure I understand this, though. If I'm mapping "complex.dtd"
to "/dvl/fds/dtd/complex.dtd", isn't that what you're asking for?
In any case, the actual path to the DTD varies depending on the
machine my TIBCO listener is running on. I think I am unsure of
which PUBLIC identifier you are referring to in my example.

I can see that I do not quite have an understanding of what is going
on here, but I guess I can get my code to work, and that was what I
was aiming for, anyway.

Thank you for the detailed reply. If you have time, I would really
appreciate an on-list or off-list explanation of my main question here:
that is, why does libxml have to call the entity loader if it already
knows how to find the entity in question? That is the one piece which
still really makes no sense to me.

Thanks again...

KEN

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Nov 08 2000 - 20:43:32 EST