Re: [xml] XPath normalize-string

Date view Thread view Subject view Author view

From: Bjorn Reese (breese@mail1.stofanet.dk)
Date: Mon Sep 25 2000 - 17:01:16 EDT


Daniel Veillard wrote:
>
> On Sun, Sep 24, 2000 at 04:44:15PM +0000, Bjorn Reese wrote:

> > The translate implementation is fairly naive, and probably doesn't work for
> > all cases of UTF-8 strings, but a naive implementation is still better than
> > none.
>
> right, but you can expect the XPath strings to be in UTF8 too so a simple
> compare would do it if the input string is a correct UTF8 input.

My concern was more about finding the appropriate substitution character,
as the from-array and the to-array can get "out of sync". For example,
let XX denotes a two octet UTF-8 encoded character. Consider a translation
which will swap each occurrence of 'a' with 'XX', and 'XX' with 'a'

  translate("XXaXXa", "aXX", "XXa")

Looking up the first occurrence of 'XX' tells us that it is located at
index 1 (starting from 0) in the from-array. However, index 1 in the
to-array is in the middle of a UTF-8 character. The correct index should
be 2.

The only solution I can think of is to convert the two arrays into
proper (UCS-2/4) Unicode or widechar arrays before processing takes
place.

> If you have more testcases based on testxPath or would like specific
> improvement, send them or request them I will try to comply.

The one that was causing me trouble with textXPath was

  normalize-string(" a b ")

I couldn't see any reason why it should fail, but I found it easier
to build my own small test program, rather than to figure out how
the autogenerated Makefile was working and how to debug the executable.

----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Sep 25 2000 - 18:43:24 EDT