[Zope3-dev] Escape Character Summary Table

Phillip J. Eby pje@telecommunity.com
Thu, 06 Jun 2002 11:10:53 -0500


I've completed my survey of the RFC's, to determine the character set for
testing.  Please comment if you see anything that needs changing or adding.
 Especially important are any characters you believe should be added to the
"Characters Under Consideration" section, since those are the ones I will
be creating the test suite for (except '-' and '_', which are known "safe"
in practical experience).

Based on the early results, it looks like using parentheses-enclosed,
comma-separated, equals-delimited parameters (e.g
".../(ns=view,something=other)somepathpart/..." will likely be a winner on
all the design criteria.  But if that doesn't work, then at least having
this table (especially once it's updated with data on what characters are
quoted by browsers and servers) will help us come up with something else.


Escape Characters For Zope Path Parameterization
------------------------------------------------

RFC note: In addition to RFC's 1738 and 2396, I also reviewed RFC 1808, but
it does not change any character definitions from 1738.  If there was no
change in a character's classification between 1738 and 2396, I've omitted
mention of 2396.  In the "Definitely Ruled Out" section, I do not always
include all RFC citations, if other criteria are already sufficient to
exclude the character from consideration.

Characters Definitely Ruled Out
===============================

space    URL-quoted in browsers.  E-mail clients don't recognize it.
#        URI fragment delimiter
%        URI encoding escape
&        HTML/XML entity reference
.        Reserved for filename extensions; leading . is Unix "hidden" file
/        URI path component delimiter
<        HTML/XML delimiter; also URI delimiter and shell redirection
>        HTML/XML delimiter; also URI delimiter and shell redirection
?        URI query delimiter
:        Quoted by some browsers; path separator on Mac
"        HTML/XML attribute delimiter
'        HTML/XML attribute delimiter

;        URI parameter delimiter - inconsistent support means we can't use
         it at start *or* end!

\        Subject to modification in some gateways, according to RFC 1738
         Also, escape character in shells, and path separator on Windows

|        Subject to modification in some gateways, according to RFC 1738
         Also, shell pipe character


Characters Probably Ruled Out
=============================

*        Glob metacharacter; RFC 1738 "extra"; RFC 2396 "unreserved mark"

$        TALES and shell variable expansion; RFC 1738 "safe";
         RFC 2396 "reserved in context"

~        Subject to modification in some gateways, according to RFC 1738;
         RFC 2396 "unreserved mark"

[        Subject to modification in some gateways, according to RFC 1738
]        Subject to modification in some gateways, according to RFC 1738
^        Subject to modification in some gateways, according to RFC 1738
`        Subject to modification in some gateways, according to RFC 1738
{        Subject to modification in some gateways, according to RFC 1738
}        Subject to modification in some gateways, according to RFC 1738


Characters Under Consideration
==============================

!        RFC 1738 "extra"; RFC 2396 "unreserved mark"

(        Shell subprocess metacharacter; RFC 1738 "extra";
         RFC 2396 "unreserved mark"

)        Shell subprocess metacharacter; RFC 1738 "extra"; 
         RFC 2396 "unreserved mark"

+        Listed as "safe" in RFC 1738; RFC 2396 "reserved in context"

,        Used in dynamic URLs created by some app servers; RFC 1738 "extra",
         RFC 2396 "reserved in context"

-        RFC 1738 "safe"; RFC 2396 "unreserved mark"
=        RFC 1738 "reserved in context"
@        RFC 1738 "reserved in context"
_        RFC 1738 "safe"; RFC 2396 "unreserved mark"


======== End Tables