[Zope] easy regular expression for URL fixup

Thomas B. Passin tpassin@mitretek.org
Thu, 7 Mar 2002 10:35:10 -0500


[Ed Colmar]

> Hey Tom
>
> Thanks again...
>
> I'm trying these out in python for checking...  Still I cannot get any
> matches...  I am really wondering if my install is botched.  Does this
work
> on yours?
>
> >>> import re
> >>> HTMLFILE=r'/\\w*\\.html'
> >>> htmlfile=re.compile(HTMLFILE)
> >>> url = "http://www.somewhere.com/folder/test.html"
> >>> m = htmlfile.match(url)
> >>> print m
> None

Regular expressions for fully general urls are hard, as you are finding out.

1) Either use r'...' syntax or double the backslashes, but don't do both.
2) Remember that the path includes "/" characters, which \w does not.
3) You may want to use non-greedy matches (see the docs).

4) What are you really trying to do?  There might be an easier way.  Do you
want to get the path the the object, the object name at the end of the path,
or what?

If you want the name at the end of the url (and the url will not have a
query string or a fragment identifier):

import string
url = "http://www.somewhere.com/folder/test.html"

split=string.split(url,'/')
print split[-1]
# prints test.html

To get the path up to but not including the final object name:

split=string.split(url,'/')
print string.join(split[:-1],'/')
# prints http://www.somewhere.com/folder

To get the path upto but but not including the ".html":
split=string.split(url,'.')
print string.join(split[:-1],'.')
# prints http://www.somewhere.com/folder/test

So it may be a lot easier to use string functions, depending on what you
want to do.

In dtml, you use _string to use the string module.

Cheers,

Tom P