[Zope3-dev] spelling of namespace signifiers
Phillip J. Eby
pje@telecommunity.com
Thu, 06 Jun 2002 08:10:50 -0500
Before debating alternatives further, perhaps we should recap the
*requirements* for namespace handling, and try to come to some agreement
about their relative priorities.
Basically, the issue is that some type of "escape" is needed to distinguish
between different namespaces in a URL path. This escape has the following
requirements (priorities are my perception):
* It MUST allow for the widest possible variety of content ID's within a
container
* It MUST NOT interfere with filename extensions as seen by the browser
* It MUST NOT interfere with the file name or path as seen by the local OS
(e.g., it should not contain OS path separator characters, or characters
that would make it difficult to use the file.)
* It MUST NOT rely on browsers or web servers to be fully RFC-compliant in
their handling of parameters, URL escapes, etc.
* It SHOULD BE able to be used in a single path component so that relative
URLs are simple
* It SHOULD BE easy to read/type
* It WOULD BE NICE if it were not ugly. :)
Have I missed anything? Does anybody disagree with the priorities I've used?
If we can agree on the requirements, I think we can rather rapidly come to
some kind of solid agreement on the spelling. My concern right now is that
we're treating everything as a MUST, and thereby running the risk of ending
up with a Papal decree that will make everyone equally unhappy, instead of
a solution that we can all live with, even if we're not deliriously happy.
Note, by the way, the last two requirements in the list imply that the
characters used should not be URL-encoded by most browsers. Steve
Alexander's urllib test aside, there are actually several characters
besides '_', '-', and '.' which are not URL-encoded. urllib.quote() is
enormously overzealous when compared to the URL RFC's as well as the actual
browser implementations out there. It's also arguably broken for doing
path segments anyway since it doesn't quote '/'! I'll address the issue of
available characters in a seperate post, however, once I've concluded my
review of the RFC's and done some browser tests. So far, I think there is
a very strong possibility that there are *many* "nice", non-ugly characters
we can use.
What I intend to do is create a small "test suite" of directories and pages
with weird characters in their URLs, and with relative and absolute links
in and among the pages. I'll then post the URLs here for people to test
against "non-mainstream" browsers, and see what ends up being quoted. I'll
include URLs that will do server-side redirects and client-side
meta-refreshes, and maybe even a JavaScript location-setting or two.
The characters used in names will strictly be for testing of whether the
URL's can be retrieved correctly, and whether they end up quoted in the
browsers' location bars, NOT proposals for syntax. Once we know our
working character set, we can THEN hash out proposals for actually *using*
the characters. (We'll also need to weed out any OS-unfriendly characters,
such as maybe '*'.)
How does that sound? Oh, and by the way, do we need these to work for FTP
or any other URL schemes besides HTTP? Please let me know so I can include
them in the test suite if possible. Thanks!