Re: [xml] Re: I18N Issues.

Date view Thread view Subject view Author view

From: Y. Cheng (ycheng@phi.sinica.edu.tw)
Date: Tue Feb 08 2000 - 20:09:10 EST


On Tue, Feb 08, 2000 at 05:02:23PM +0100, Daniel Veillard wrote:
> On Tue, Feb 08, 2000 at 09:11:13PM +0800, Y. Cheng wrote:
> > On Tue, Feb 08, 2000 at 09:39:08AM +0100, Daniel Veillard wrote:
> > > > With this, as the encoding change from XML_CHAR_ENCODING_NONE to
> > > > some spectific encoding (well, say EUC-JP), we needs a function to
> > > > transform the existing buffer from origional data to read utf8.
> > > > (maybe xmlSwitchEncoding of call by xmlSwitchEncoding). But this
> > > > transformation only accept transformation from XML_CHAR_ENCODING_NONE
> > > > to some other encoding. Once the transformation is done, no more
> > > > transformation is necessay (there is no method to switch encoding
> > > > in the middle of XML document, right ?) so all other mechanism will
> > > > work.
> > > yep something like that is needed.
> >
> > Do you think we should change from
> >
> > isolat1ToUTF8(xxx *out, int outlen, xxx *in, int inlen)
> > to
> > isolat1ToUTF8(xxx *out, int outlen, xxx *in, int *inlen)
> >
> > which the output of inlen is how many bytes left on in.
> isolat1ToUTF8 (and this class of functions) returns the number of byte
> written, or -1 by lack of space. So there is already an error condition
> but I agree that the upper layer pushing a buffer which doesn't end-up
> on a character boundary should not be considered an error.

> For compatibility with other similar apis, I would rather have *inlen
> return the actual number of byte read. In that case we could also not
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> generate an error if outlen is too small either. it's better to unify the
> two "error" case handling.

Agree.

> > > > And the last thing, the encoding transformation function in
> > > > encoding.c (say UTF8ToUTF16) can't handle if not enough byte is
> > > > given (say one the first byte of a three-bytes character is read in)
> > > > For a thread-safe library, I suggest we add a return value to let
> > > > UTF8toUTF16 say that some byte are not processed and store these
> > > > byte in a new field on "struct _xmlParserInput".
> > > I'm not entierely convinced it's the place where this should be
> > > stored. I would rather add it to the input buffer. After all at one
> > > time multiple entities may be opened and we may have residual bytes
> > > from each of them.
> >
> > You mean "struct _xmlParserInputBuffer" ?
> > If yes, I can't agree anymore and sorry not to notice this.
> I assume I should read "I can't agree more", right ?

Yes.

> > I think it' will be more moduler in this way.
> > But with this, we need a function to say that
> > there are some bytes left (which can't be transformed
> > into one single utf8) but we have got EOF already.
> >
> > I plan to use iconv to convert many encoding to utf8.
> > As I know, there are two implemention of stand-alone libiconv
> > (glibc also has one, but if you don't want glibc).
> yes but it has to be optionnal. Libxml is fairly standalone right
            ^^^^^^^^^^^^^^^^^^^^^^
> now, and I would rather keep it that way, at least for the major encodings
> required by the spec (ISO-Latin-x, UTF-8 and UTF-16 at least).

Agree.

> I am converting the parser code a lot there days, if you could focuse on
> encoding.c for a couple of days that would be easier.

Okay, I will do that.

ycheng

----
Message from the list xml@xmlsoft.org
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@xmlsoft.org


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 02 2000 - 12:30:01 EDT