Re: [xml] Using the encoders

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Wed Feb 09 2000 - 10:08:23 EST


  Hi Lutz,

On Wed, Feb 09, 2000 at 03:34:57PM +0100, Lutz Behnke wrote:
> Hi there,
>
> I am currently trying to write a DOMHASH signing and verification implementation
> for libxml. Now I have some questions:

   Good ! BTW doesn't this requires having an implementation of canonical XML ?
c.f. http://www.w3.org/TR/xml-c14n which is in Last Call at the moment.
If yes working on this part could be done now ... c.f. the following.

> a) Is there such an implement for/in libxml allready done? I could only find that
> sneak-ware java thingy from IBM.

  I never heard of any.

> b) if not, how can I use the encoding functions from libxml. I need to get the
> names and contents of the elements in UTF-16BE. I currently would understand the code
> to be something like (for the name of the element):
>
> xmlCharEncodingHandler utf16_handler = xmlGetCharEncodingHandler(XML_CHAR_ENCODING_UTF16BE);
>
> utf16_handler->xmlCharEncodingOutputFunc(node->name, strlen(node->name),
> (utf16_name, some_guessed_value);

  yes, that what it is designed for. But this is not thoroughtly tested
(euthemism !!) and subject to changes.

> this is under the assumption that the name and CDATA is stored in allmost UTF-8
> (plain ASCII to be exact)
> Is that correct? Is the fact that the data is in UTF-8/ASCII ensured by the library?
> What happens when I present a full blown unicode document to libxml?

  Well this is being worked on right now. Basically libxml as released doesn't
garantee anything it just uses xmlChar (one byte unsigned elements) arrays to
keep document content in memory.
  I'm working on enforcing UTF-8 support, with one exception, if the document
specified an encoding (like the ISO-Latin series) where chars use a fixed lenght
of 1.
  What is expected once 2.0 is released is that UTF-16 (and other encoding not
fiting on 1byte chars) is that the content gets converted on the fly before
being passed to the parser input buffer.
  So basically yes we will work on providing conversion to/from UTF-8 for those.
The code in encoding.c is subject to changes, c.f. the following message and
thread:
  http://xmlsoft.org/messages/0340.html

  This part of the code probably won't be released before 2.0 and sits for the
moment in the W3C CVS base at
  http://dev.w3.org/ in the XML module.

> c) How can I compute the neccesary lenght of the output buffer. Wouldn't be a good thing
> for the function to return the _needed_ bytes if I call it with a output buffer of NULL.

  I think the best (CPU wise) is to also return the number of bytes read
from the buffer, if different from the value given, this may mean:
   - that a larger output buffer is needed
   - that the input didn't end-up on a character boundary

 Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:01 EDT