Re: [xml] Numeric Entities

Date view Thread view Subject view Author view

From: Daniel Veillard (Daniel.Veillard@w3.org)
Date: Mon Sep 25 2000 - 09:06:01 EDT


On Mon, Sep 25, 2000 at 02:30:19PM +0200, Helge Hess wrote:
[Please sub scribe to post to the list]
> Hi,
>
> is it correct behaviour that
>
> <string>&#7;</string>
>
> doesn't work (CharRef: invalid xmlChar value 1) ? I would expect that I
> can encode/escape any unichar that way ?

  I don't think so !!!

http://www.w3.org/TR/REC-xml#charsets explicitely says:
-----------------------------------------
A parsed entity contains text, a sequence of characters, which may
represent markup or character data. A character is an atomic unit of text
as specified by ISO/IEC 10646 [ISO/IEC 10646]. Legal characters are tab,
carriage return, line feed, and the legal graphic characters of Unicode
and ISO/IEC 10646.
-----------------------------------------

And the following production is quite precise about this:

-----------------------------------------
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
             [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
-----------------------------------------

 And 0x7 is clearly not within that range. It means you cannot
embed binary data within XML documents without escaping (uuencode
is one of the methods which should work). Using &#7; is not an escaping
at the XML level it is an escaping at the encoding level IMHO.

Daniel

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel : +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax : +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe
----
Message from the list xml@rpmfind.net
Archived at : http://xmlsoft.org/messages/
to unsubscribe: echo "unsubscribe xml" | mail  majordomo@rpmfind.net


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Sep 25 2000 - 09:43:59 EDT