[Zope] Strip all HTML

Paul Winkler pw_lists@slinkp.com
Tue Aug 5 15:49:39 EDT 2003


On Tue, Aug 05, 2003 at 06:39:11AM -0700, Dylan Reinhardt wrote:
> So you can try something like:
> 
> -----
> 
> import re
> 
> style = re.compile('<style.*?>.*?</style>', re.I | re.S)
> script = re.compile('<script.*?>.*?</script>', re.I | re.S)
> tags = re.compile('<.*?>', re.S)
> 
> return tags.sub('', script.sub('', style.sub('', text)))

hmm... doesn't the tags pattern make the other two redundant?

one problem with this approach is that it removes any xml or sgml
markup from inside a <pre> block, which may not be what you want.
Processing html with regular expressions is notoriously frustrating.
I'd look in to using htmllib from the standard library, or
fixing Strip-O-Gram to do what you want.

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's THE UNWORTHY SEEKER!
(random hero from isometric.spaceninja.com)




More information about the Zope mailing list