Re: [xml] Re: I18N Issues.

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Thu Feb 10 2000 - 06:23:43 EST


On Thu, Feb 10, 2000 at 10:07:07AM +0800, Y. Cheng wrote:
>
> On Wed, Feb 09, 2000 at 09:09:10AM +0800, Y. Cheng wrote:
> > > > isolat1ToUTF8(xxx *out, int outlen, xxx *in, int inlen)
> > > > to
> > > > isolat1ToUTF8(xxx *out, int outlen, xxx *in, int *inlen)
>
> One question here:
>
> Should we generate &#___; or &x___; in UTF8Toisolat1 ?
> (as this utf8 character have no corrspond isolat1)

  Well I guess &#__; and &#x__; strictly equivalent. That's a general
problem of parsing/editing/saving when there is multiple representations.
In that case we don't have enough information to select "the best" so
it's really equivalent. I woult opt for &#x__; since it allows more easily
to see in what range the char is. On the other hand this form may take a bit
more bytes in average.

> Should we handle &#___; or &x___; on isolat1ToUTF8 ?

  I would say no. In that case I would generate ENTITY refs nodes
with the corresponding name in the parser output. The good point is
that the saved document will then use the same char ref encoding as
the input.

  One thing to note about isolat1ToUTF8 and UTF8Toisolat1 is that
I expect to handle ISO Latin 1 (and other ISO Latin) natively, so
those inport/export functions will only be used when a charset encoding
conversion is explicitely asked for. Those interfaces are not yet
availble.

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:02 EDT