[Zope3-dev] Re: Apache rewrite rules and URLs: an experiment

Bjorn Tillenius bjoti777 at student.liu.se
Thu Nov 4 16:33:38 EST 2004


On Thu, Nov 04, 2004 at 04:04:15PM -0500, Jim Fulton wrote:
> Bjorn Tillenius wrote:
> >On Thu, Nov 04, 2004 at 11:01:12AM -0500, Jim Fulton wrote:
> >
> >>Hm, the interface for getApplicationURL doesn't say whether the returned
> >>value is encoded. It needs to say this.  The interface needs to be fixed
> >>IOW.
> >>
> >>Given:
> >>
> >>- We expect a URL
> >>
> >>- URLs must be URL encoded
> >>
> >>- *Before* URL encoding, we need to utf-8 encode
> >>
> >>Then the output of getApplicationURL must certainly be a utf-8-url-encoded
> >>string.
> >
> >
> >Yes, that's what's happening for the path part of the URL. I guess that
> >no one cared to encode the host part, since it should only contain
> >ascii characters.
> 
> Is that true any more?

What is true anymore? That getApplication encodes the path part of the
URL is certainly true. About the host only containing ascii characters,
well I have to say I don't know too much about it. It's possible to
use unicode characters in a host name. Although, I think that when it's
transmitted via HTTP it gets encoded to ascii characters. So, since zope
doesn't decode this, I would assume that all the host variables only
contain ascii for now. But I will see if I can find some RFC about it.

> >I will also update the interface documentation for URL and getURL. I
> >assume those should be encoded the same way as getApplicatonURL?
> 
> Yes, URLs should always be assumed to be utf-8 encoded and then url encoded,

Thought so, just wanted to make sure. There were quite a few objections
at first when I wanted to have AbsoluteUrl produce such URLs...

> >And while I'm at it, another thing I encountered the last time I was
> >digging in the code, although I forgot to bring it up. When the raw http
> >request comes to zope, it decodes the URL and stores it as unicode.
> >Although it tries to decode the URL, using the charset it derives from
> >the request. IMHO this is wrong, it should use utf-8 instead, shouldn't
> >it?
> 
> Absolutely.
> 
> >There are at least two problems with the current approach:
> >
> > * No non-ascii URL is guaranteed to work on every system
> >
> > * Many browser, at least Opera, defaults to utf-8 for URLs
> 
> This (utf-8 encoding and then url-encoding) is specified in
> an RFC (somewhere :).

Actually it doesn't specify that utf-8 has to be used, it's only a
recommendation. Although a good recommendation.

Regards,
  Bjorn


More information about the Zope3-dev mailing list