[Zope3-dev] Re: Apache rewrite rules and URLs: an experiment

Jim Fulton jim at zope.com
Thu Nov 4 16:38:39 EST 2004


Bjorn Tillenius wrote:
> On Thu, Nov 04, 2004 at 04:04:15PM -0500, Jim Fulton wrote:
> 
>>Bjorn Tillenius wrote:
>>
>>>On Thu, Nov 04, 2004 at 11:01:12AM -0500, Jim Fulton wrote:
>>>
>>>
>>>>Hm, the interface for getApplicationURL doesn't say whether the returned
>>>>value is encoded. It needs to say this.  The interface needs to be fixed
>>>>IOW.
>>>>
>>>>Given:
>>>>
>>>>- We expect a URL
>>>>
>>>>- URLs must be URL encoded
>>>>
>>>>- *Before* URL encoding, we need to utf-8 encode
>>>>
>>>>Then the output of getApplicationURL must certainly be a utf-8-url-encoded
>>>>string.
>>>
>>>
>>>Yes, that's what's happening for the path part of the URL. I guess that
>>>no one cared to encode the host part, since it should only contain
>>>ascii characters.
>>
>>Is that true any more?
> 
> 
> What is true anymore?

That is *deep*. ;)

I was asking if it was true that host names can't include unicode.
I thought they could, although I don't know what the encoding rules are.

 > That getApplication encodes the path part of the
> URL is certainly true. About the host only containing ascii characters,
> well I have to say I don't know too much about it. It's possible to
> use unicode characters in a host name. Although, I think that when it's
> transmitted via HTTP it gets encoded to ascii characters. So, since zope
> doesn't decode this, I would assume that all the host variables only
> contain ascii for now. But I will see if I can find some RFC about it.

Thanks

> 
>>>I will also update the interface documentation for URL and getURL. I
>>>assume those should be encoded the same way as getApplicatonURL?
>>
>>Yes, URLs should always be assumed to be utf-8 encoded and then url encoded,
> 
> 
> Thought so, just wanted to make sure. There were quite a few objections
> at first when I wanted to have AbsoluteUrl produce such URLs...

I remember some lively discussions on that topic.  I'm glad
that that was reslved (not by us).

> 
>>>And while I'm at it, another thing I encountered the last time I was
>>>digging in the code, although I forgot to bring it up. When the raw http
>>>request comes to zope, it decodes the URL and stores it as unicode.
>>>Although it tries to decode the URL, using the charset it derives from
>>>the request. IMHO this is wrong, it should use utf-8 instead, shouldn't
>>>it?
>>
>>Absolutely.
>>
>>
>>>There are at least two problems with the current approach:
>>>
>>>* No non-ascii URL is guaranteed to work on every system
>>>
>>>* Many browser, at least Opera, defaults to utf-8 for URLs
>>
>>This (utf-8 encoding and then url-encoding) is specified in
>>an RFC (somewhere :).
> 
> 
> Actually it doesn't specify that utf-8 has to be used, it's only a
> recommendation. Although a good recommendation.

I'm 97% sure I saw something that said utf-8.  It doesn't matter,
I'm fairly sure that that is what modern browsers do in practice.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


More information about the Zope3-dev mailing list