[Zope3-dev] Re: Make AbsoluteURL produce quoted urls

Bjorn Tillenius bjoti777 at student.liu.se
Tue Jun 1 12:33:12 EDT 2004


On Tue, Jun 01, 2004 at 02:47:45PM +0100, Stuart Bishop wrote:
> On 01/06/2004, at 1:39 PM, Philipp von Weitershausen wrote:
> 
> >Bjorn Tillenius wrote:
> >>> No, I think there's a quite easy solution. AbsoluteURL.__call__ 
> >>should > return unicode so that one has all the options when using it 
> >>from > Python. AbsoluteURL.__str__ should return whatever __call__ 
> >>would > return, but encoded in UTF8. If I remember correctly, TALES 
> >>first > evaluates __str__ before __call__, so the path expressions 
> >>would still > be fine.
> >>I'm almost fine with this, __str__ should still return ascii, it 
> >>should
> >>quote the url instead (but maybe that's what you meant).
> >
> >Indeed the spec (http://www.ietf.org/rfc/rfc2718.txt, section 2.2.5) 
> >suggests::
> >
> >      Unless there is some compelling reason for a
> >      particular scheme to do otherwise, translating character 
> >sequences
> >      into UTF-8 (RFC 2279) [3] and then subsequently using the %HH
> >      encoding for unsafe octets is recommended.
> >
> >So, __str__ could indeed first encode to UTF-8 and then urlquote so we 
> >end up with%HH. I can't come up with a good use case for wanting a 
> >string but not quoted, so having either unicode or a quoted string 
> >would be enough and easily implemented.
> 
> It is impossible to convert a Unicode URL to an ASCII string and
> *not* have it quoted.

That's true, but nobody wanted to do that anyway. The question was
wether to return ascii or utf-8 (or another encoding).

> I would prefer the AbsoluteURL to be a subclass of unicode, so:
> 
> >>> url = URL(u'http://www.ol\xe9.de/\xc7/page_\u2160.html')
> >>> unicode(url)
> u'http://www.ol\xe9.de/\xc7/page_\u2160.html'
> >>> str(url)
> 'http://www.xn--ol-cja.de/rene%C3%A9.html'
> >>> url.urlencode()
> 'http://www.xn--ol-cja.de/rene%C3%A9.html'

I don't like the urlencode method. I think an AbsoluteURL should be a
valid URL, if you want to do something special, like converting it to
unicode, you should have to do something extra. Not the other way around.

So, I want to do the following changes::

  * Add __unicode__, which will of course return a unicode string.

  * Change __str__ so that it takes the unicode url, encodes it to
    utf-8, and urlquotes it before it gets returned.

No change will be done to __call__'s behaviour. I also won't change the
way domain names are treated, I'm not even sure it's AbsoluteURL's
responsiblity to do that.

> The last syntax was to make TALES nicer:
> 
> <a tal:attributes="href someurl/urlencode" tal:content="someurl" />
> 
> I prefer that to the proposed use of __call__, which would mean I
> would have to write the above as:
> 
> <a tal:attributes="href someurl" tal:content="python: someurl()" />

If you wanted to that (which once again is a bad idea due to the world
not using a single encoding), you could do:
 <a tal:attributes="href someurl" tal:content="someurl/__unicode__" />

Maybe it could be considered to add a 'unicode' method or something to
make it cleaner, but I won't do it.

Regards,
  Bjorn



More information about the Zope3-dev mailing list