[Zope] regex question

Sam Gendler sgendler@teknolojix.com
Mon, 29 Nov 1999 23:32:34 -0800


Sam Gendler wrote:

> I have never been much of a regex master, and I am having difficulty
> constructing one that should be fairly simple.  I want to find all the
> text that is between <body> and </body> in a variable that may (and
> probably will) contain newlines.  I am removing case insensitive
> searches in the following examples in order to make the regex's simpler.
>
> To grab find the opening <body> tag, I have '<[\t ]*body[\t ]*.*>'
> which is almost correct, but not quite.  This expression finds
> <bodystuff>, too, so I really need something that finds '<\t ]*body(\t
> ]+.*>)|(>)', but I can't find a construct that works.  Basically, it
> needs at least one whitespace followed by stuff followed by '>', or else
> it needs no whitespace followed by '>'
>
> I can use similar code to find the </body> tag.
>
> However, putting those two together around a \(\(.*\n*\)*\), in order to
> match all the text between the <body> and </body> tags sends python into
> an infinite loop.  It doesn't like it when I try to match an unlimited
> number of lines.before the </body> tag

OK, I solved this one.  I can now determine the difference between
<bodykjhsd> and <body kjhsd>
I gave up on doing it correctly.  I am now compiling two different regex's,
one that finds the <body> tag, and one that finds the </body> tag.  I use
object.regs[index] to then splice the string into the correct substring.
UGLY.

--sam