[Zope] DTML, Zope and Regex

Jim Penny jpenny@universal-fasteners.com
Wed, 10 Jul 2002 13:01:06 -0400


On Wed, Jul 10, 2002 at 05:49:43PM +0200, Oliver Bleutgen wrote:
> Jim Penny wrote:
> >On Wed, Jul 10, 2002 at 03:17:14PM +0100, Ben Avery wrote:
> >
> >>well, external methods are python scripts with no safety measures at 
> >>all, so are potentially much more unsafe than any use of regexps in a 
> >>python script. So I'd say it's better to allow the re module in your 
> >>python scripts (see my previous post) than resort to external methods.
> >>
> >>but I also haven't come across a reason to consider regexps unsafe. I'm 
> >>sure it's been discussed here before - could someone point us to a post 
> >>on this subject, pls ?
> >
> >
> >As I understand it, the problem is not so much security, pro se, but
> >denial of service.  That is, it is extremely easy to write regular
> >expressions which take enormous amounts of time or memory to process.
> 

See http://www.usenix.com/publications/login/1999-4/reg_exp.html

for a real-world example.  As is noted in the article, a 1700-fold
improvement here, a 1700-fold improvement there, can start to add up!

regexes are worst case exponential in speed.

On a side note - (warning diversion), you might also consider 
the Perl Apocolypse 5
http://www.perl.com/lpt/a/2002/06/04/apo5.html


> >
> >Worse, the processing time and space is extremely dependent on input,
> >so that apparently well-tested code can suddenly become a liability when
> >exposed to a less than friendly audience.  (Think about a line-oriented 
> >regex that is furnished multi-megabyte line.)
> 
> if inputvar='killmyserver':
>   my_bigassarray=[]
>   i=0
>   while(1):
>     i=i+1
>     my_bigassarray.append('bla'*i)
> else:
>   return 'whoa, I was lucky'
> >
> >To say it another way, using regex does not make it more likely that you
> >will be cracked.  It does make it more likely that your system will
> >appear to be unresponsive, and, if memory exhaustion occurs, dead.
> 

This is not exactly what I had in mind when I said "apparently
well-tested code".  By that phrase I meant that the code, by a
combination of inspection and testing was reasonably expected to not
blow up, or not take excessive amounts of time.

Because regexes have a worst-case exponential behavior (as I recall, in
both space and time), and because it is reasonably easy to introduce that
kind of behavior accidently and without malice; it seems to me to be a
reasonable engineering tradeoff to prohibit regexes in TTW programming.

This does not say that we can prohibit every kind of abuse, it does not
say that regexes are not valuable tools.  It does say that they are
somewhat dangerous tools that can have difficult to predict impact on
performance.

Jim Penny
> 
> cheers,
> oliver
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - 
> http://lists.zope.org/mailman/listinfo/zope-announce
> http://lists.zope.org/mailman/listinfo/zope-dev )
>