[Zope3-dev] spelling of namespace signifiers

Thu, 06 Jun 2002 08:10:50 -0500

Before debating alternatives further, perhaps we should recap the 
*requirements* for namespace handling, and try to come to some agreement 
about their relative priorities.

Basically, the issue is that some type of "escape" is needed to distinguish 
between different namespaces in a URL path.  This escape has the following 
requirements (priorities are my perception):

* It MUST allow for the widest possible variety of content ID's within a 
container

* It MUST NOT interfere with filename extensions as seen by the browser

* It MUST NOT interfere with the file name or path as seen by the local OS 
(e.g., it should not contain OS path separator characters, or characters 
that would make it difficult to use the file.)

* It MUST NOT rely on browsers or web servers to be fully RFC-compliant in 
their handling of parameters, URL escapes, etc.

* It SHOULD BE able to be used in a single path component so that relative 
URLs are simple

* It SHOULD BE easy to read/type

* It WOULD BE NICE if it were not ugly.  :)

Have I missed anything?  Does anybody disagree with the priorities I've used?

If we can agree on the requirements, I think we can rather rapidly come to 
some kind of solid agreement on the spelling.  My concern right now is that 
we're treating everything as a MUST, and thereby running the risk of ending 
up with a Papal decree that will make everyone equally unhappy, instead of 
a solution that we can all live with, even if we're not deliriously happy.

Note, by the way, the last two requirements in the list imply that the 
characters used should not be URL-encoded by most browsers.  Steve 
Alexander's urllib test aside, there are actually several characters 
besides '_', '-', and '.' which are not URL-encoded.  urllib.quote() is 
enormously overzealous when compared to the URL RFC's as well as the actual 
browser implementations out there.  It's also arguably broken for doing 
path segments anyway since it doesn't quote '/'!  I'll address the issue of 
available characters in a seperate post, however, once I've concluded my 
review of the RFC's and done some browser tests.  So far, I think there is 
a very strong possibility that there are *many* "nice", non-ugly characters 
we can use.

What I intend to do is create a small "test suite" of directories and pages 
with weird characters in their URLs, and with relative and absolute links 
in and among the pages.  I'll then post the URLs here for people to test 
against "non-mainstream" browsers, and see what ends up being quoted.  I'll 
include URLs that will do server-side redirects and client-side 
meta-refreshes, and maybe even a JavaScript location-setting or two.

The characters used in names will strictly be for testing of whether the 
URL's can be retrieved correctly, and whether they end up quoted in the 
browsers' location bars, NOT proposals for syntax.  Once we know our 
working character set, we can THEN hash out proposals for actually *using* 
the characters.  (We'll also need to weed out any OS-unfriendly characters, 
such as maybe '*'.)

How does that sound?  Oh, and by the way, do we need these to work for FTP 
or any other URL schemes besides HTTP?  Please let me know so I can include 
them in the test suite if possible.  Thanks!