[Zope3-dev] Re: Apache rewrite rules and URLs: an experiment

Jim Fulton jim at zope.com
Thu Nov 4 16:04:15 EST 2004


Bjorn Tillenius wrote:
> On Thu, Nov 04, 2004 at 11:01:12AM -0500, Jim Fulton wrote:
> 
>>Bjorn Tillenius wrote:
>>
>>>On Thu, Nov 04, 2004 at 09:48:56AM -0500, Jim Fulton wrote:
>>>
>>>
>>>>Peter Mayne wrote:
>>>>
>>>>
>>>>>If I try the above <tal:block> when I access Zope directly, it works. 
>>>>>However, if I access it via Apache, I get:
>>>>>
>>>>>...
>>>>>File "C:\opt\Python23\Lib\site-packages\zope\tal\talinterpreter.py", 
>>>>>line 451, in do_insertText_tal
>>>>> text = self.engine.evaluateText(stuff[0])
>>>>>File 
>>>>>"C:\opt\Python23\Lib\site-packages\zope\app\pagetemplate\engine.py", 
>>>>>line 105, in evaluateText
>>>>> return unicode(text)
>>>>>File 
>>>>>"C:\opt\Python23\Lib\site-packages\zope\app\traversing\browser\absoluteur
>>>>>l.py", line 101, in __unicode__
>>>>> return urllib.unquote(self.__str__()).decode('utf-8')
>>>>>AttributeError: 'unicode' object has no attribute 'decode'
>>>>
>>>>That's odd.
>>>>
>>>>
>>>>
>>>>>I'm not even going to think about why this is happening.
>>>>
>>>>Suit yourself.  Someone should think about why it's happening.
>>>
>>>
>>>I would guess that some variable that apache sets to determine the host
>>>is being represented as a unicode string.
>>
>>But it gets to Zope via HTTP, which is an ASCII subset.  The publisher
>>is supposed to give all of this to Zope decoded.  IOW, the input data
>>to getApplicationURL should always be unicode.  I guess getApplicationURL
>>encodes. (? I don't remember the details.)
> 
> 
> Right, sorry, I was temporarily confused... But I still suspect that
> some of the 'host variables' are unicode, some don't. I guess that when
> virtual hosting is used, it sets some variable as a unicode string. So,
> I guess that all HTTP variables should be unicode then?

That's a good question.  So, Zope gets variables as strings.
Some of these might be encoded.  Like mayme server URL.

 > I've looked at
> the code several times before, but haven't been able to find some
> documentation about it.

Can't help you there. :)

> 
>>Hm, the interface for getApplicationURL doesn't say whether the returned
>>value is encoded. It needs to say this.  The interface needs to be fixed
>>IOW.
>>
>>Given:
>>
>>- We expect a URL
>>
>>- URLs must be URL encoded
>>
>>- *Before* URL encoding, we need to utf-8 encode
>>
>>Then the output of getApplicationURL must certainly be a utf-8-url-encoded
>>string.
> 
> 
> Yes, that's what's happening for the path part of the URL. I guess that
> no one cared to encode the host part, since it should only contain
> ascii characters.

Is that true any more?

> I will also update the interface documentation for URL and getURL. I
> assume those should be encoded the same way as getApplicatonURL?

Yes, URLs should always be assumed to be utf-8 encoded and then url encoded,

> And while I'm at it, another thing I encountered the last time I was
> digging in the code, although I forgot to bring it up. When the raw http
> request comes to zope, it decodes the URL and stores it as unicode.
> Although it tries to decode the URL, using the charset it derives from
> the request. IMHO this is wrong, it should use utf-8 instead, shouldn't
> it?

Absolutely.

> There are at least two problems with the current approach:
> 
>  * No non-ascii URL is guaranteed to work on every system
> 
>  * Many browser, at least Opera, defaults to utf-8 for URLs

This (utf-8 encoding and then url-encoding) is specified in
an RFC (somewhere :).

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


More information about the Zope3-dev mailing list