[Zope3-dev] Re: Apache rewrite rules and URLs: an experiment

Bjorn Tillenius bjoti777 at student.liu.se
Thu Nov 4 15:58:28 EST 2004


On Thu, Nov 04, 2004 at 11:01:12AM -0500, Jim Fulton wrote:
> Bjorn Tillenius wrote:
> >On Thu, Nov 04, 2004 at 09:48:56AM -0500, Jim Fulton wrote:
> >
> >>Peter Mayne wrote:
> >>
> >>>If I try the above <tal:block> when I access Zope directly, it works. 
> >>>However, if I access it via Apache, I get:
> >>>
> >>>...
> >>>File "C:\opt\Python23\Lib\site-packages\zope\tal\talinterpreter.py", 
> >>>line 451, in do_insertText_tal
> >>>  text = self.engine.evaluateText(stuff[0])
> >>>File 
> >>>"C:\opt\Python23\Lib\site-packages\zope\app\pagetemplate\engine.py", 
> >>>line 105, in evaluateText
> >>>  return unicode(text)
> >>>File 
> >>>"C:\opt\Python23\Lib\site-packages\zope\app\traversing\browser\absoluteur
> >>>l.py", line 101, in __unicode__
> >>>  return urllib.unquote(self.__str__()).decode('utf-8')
> >>>AttributeError: 'unicode' object has no attribute 'decode'
> >>
> >>That's odd.
> >>
> >>
> >>>I'm not even going to think about why this is happening.
> >>
> >>Suit yourself.  Someone should think about why it's happening.
> >
> >
> >I would guess that some variable that apache sets to determine the host
> >is being represented as a unicode string.
> 
> But it gets to Zope via HTTP, which is an ASCII subset.  The publisher
> is supposed to give all of this to Zope decoded.  IOW, the input data
> to getApplicationURL should always be unicode.  I guess getApplicationURL
> encodes. (? I don't remember the details.)

Right, sorry, I was temporarily confused... But I still suspect that
some of the 'host variables' are unicode, some don't. I guess that when
virtual hosting is used, it sets some variable as a unicode string. So,
I guess that all HTTP variables should be unicode then? I've looked at
the code several times before, but haven't been able to find some
documentation about it.

> Hm, the interface for getApplicationURL doesn't say whether the returned
> value is encoded. It needs to say this.  The interface needs to be fixed
> IOW.
> 
> Given:
> 
> - We expect a URL
> 
> - URLs must be URL encoded
> 
> - *Before* URL encoding, we need to utf-8 encode
> 
> Then the output of getApplicationURL must certainly be a utf-8-url-encoded
> string.

Yes, that's what's happening for the path part of the URL. I guess that
no one cared to encode the host part, since it should only contain
ascii characters.

I will also update the interface documentation for URL and getURL. I
assume those should be encoded the same way as getApplicatonURL?

And while I'm at it, another thing I encountered the last time I was
digging in the code, although I forgot to bring it up. When the raw http
request comes to zope, it decodes the URL and stores it as unicode.
Although it tries to decode the URL, using the charset it derives from
the request. IMHO this is wrong, it should use utf-8 instead, shouldn't
it?

There are at least two problems with the current approach:

 * No non-ascii URL is guaranteed to work on every system

 * Many browser, at least Opera, defaults to utf-8 for URLs

Regards,
  Bjorn


More information about the Zope3-dev mailing list