Re[2]: [xml] Encoding Problems with libxml 2.2.2

Date view Thread view Subject view Author view

From: Stefan Bambach (bambach@triplex.de)
Date: Wed Aug 30 2000 - 05:31:47 EDT


Hallo Daniel,

Tuesday, August 29, 2000, 7:36:05 PM, you wrote:

DV> On Tue, Aug 29, 2000 at 06:04:12PM +0200, Stefan Bambach wrote:
>>
>> Hi,
>>
>> I wrote a basic libxml wrapper for python (very basic :-) ). I walk
>> through the DOM tree by myself (child = child->next) and read out the
>> content. I had access to the TAGs and the values, attributes, ... .
>>
>> Now with libxml 2.2.2 the DOM tree's content is UTF8 encoded and can't
>> read out it as simple as it was before. I have to convert the values
>> to ISO-8859-1.

DV> Or EUC-JP if you're japanese or ISO-8859-2 if you're russian, etc ...
DV> Yes libxml is now consistant independantly of the kind of input.
I have to store some data from the XML file in MySQL. It's enough to
store it as 8859-1, because I know that the system is intended for
germans only :-) So I don't need all features.

>> Is there a Funktion that is equivalent to some kind of ugly code like
>> printf ('%s', node->name) doing the encoding, too ?

DV> question is what encoding .... Are you just interested in ISO-8859-1 ?
DV> If the answer is yes it's a good idea to ask you the question: why ?
I read the value of a tag (e.g. <TEST>with german special chars like
äöüß</TEST>) with xmlNodeListGetString(). This function returns the
UTF8 String from DOM tree. That String will be stored in MySQL, read
out of this later and displayed on some Browers (HTML encoding is
implemented by me). This was fine with Version 2.0.0 because I got the
special chars in the right encoding. Now I get some UTF8 encoded
Chars, that are stored in Mysql as normal String. The result is that I
get wrong string to display.
Yesterday, I used the UTF8toisolat1() function to do the job for me (I
have to convert each value I read from DOM tree by myself). Is there I
function like xmlNodeListGetString() with an additional parameter, the
encoding string, and I get the String as I need it ?

>> Are there functions to read out the content (value) of a tag, the tag
>> itself and attribute names with there value ?

DV> Do you mean that python has no support for handling UTF8 strings ???
DV> How is python expected to work in Internationalized environments ?
I don't know it for sure, but I think I have seen some kind of
encoding modules. So I will search for it. I think there are converter
functions which can do the job for me, but I have to convert each
value to store it in DOM tree or visa versa.

DV> Converting the value as you read them from the tree content may not
DV> make much sense, I would expect high level languages like python to
DV> be able to work out of the box with UTF8 strings, I am very surprized.
I will tell you if I find something about that.

DV> Daniel

Mit freundlichen Grüssen.
Stefan Bambach

-- 
Stefan Bambach

triplex-agentur fuer neue medien GmbH herzog-heinrich-strasse 11-13 80336 muenchen

tel: 089-209 138 29 fax: 089-209 138 10

mailto:bambach@triplex.de http://www.triplex.de

---- Message from the list xml@rpmfind.net Archived at : http://xmlsoft.org/messages/ to unsubscribe: echo "unsubscribe xml" | mail majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Wed Aug 30 2000 - 02:45:05 EDT