[Zope3-dev] Zope 3 ids, names, and unicode

Jim Fulton jim@zope.com
Sat, 18 May 2002 13:25:04 -0400


I'm gonna do something unusual and move a dicussion from a Wiki to 
a mailing list, because I suspect people will have a lot to say about 
this topic.  In:

http://dev.zope.org/Wikis/DevSite/Projects/ComponentArchitecture/UnicodeForText

Chris Withers asks:

>  1. How come content names can't be unicode? That might be limiting for a
>    lot of users...

Perhaps this gets back to the debate wrt names and ids.  In Zope 2,
ids are used as URL path segments and we've resisted allowing non-URL
characters in ids.  We've bent the rules for certain characters
commonly used in Windows file names.

The URI standard only allows a subset of ascii and allows octets to be
escaped, so you could probably support Latin-1 characters in URIs
assuming you were willing to live with the necessary escaping.  I
suppose you could even decide that you were willing to encode unicode
characters into URI segments through a unicode -> utf-8 ->
escaped-octets. We would need to to a reverse conversion as text is
read in. Note that this would disable use of escaped latin-1 octets in
URIs.

I think it is important to retain the relationship between ids in Zope
containers and URI path segments. IOW in a URI, a subpath::

  aContainer/id

should always look up an object in a container based on the id.

I'd prefer that such URIs not look like random bytes. :)

I grabbed 6 arbitrary russion characters (unfortunately, I don't 
speak russion :) off of a UTF-8 test page, and URL quoted them. 
This is what I got:

  %D0%BF%D1%80%D1%81%D1%82%D1%83%D1%84

Uh, will Russion speakers find such a URL useful? Will anyone else?

Do people *really* want URLs with segments like the one above?

I can understanmd why people expect names to be "text". Do people have
the same expectations of ids? Does this inform our debate with regard
to "name" versus "id"? :)

Perhaps we should stick to the term "id" for unique identifiers in
containers. Note that content will also have titles and titles will be
text, and, thus, unicode.

Thoughts?

Jim

--
Jim Fulton           mailto:jim@zope.com       Python Powered!        
CTO                  (888) 344-4332            http://www.python.org  
Zope Corporation     http://www.zope.com       http://www.zope.org