[ZPT] OT (and probably a bit long ;-) HTML Filtering

Guido van Rossum guido@digicool.com
Wed, 16 May 2001 10:32:47 -0500


> > When parsing the following HTML: 
> > 
> > 'Roses <b>are</B> red,<br/>violets <i>are</i> blue' 
> > 
> > ...with the following class: 
> > 
> > class HTML2SafeHTML(sgmllib.SGMLParser): 
[proof of broken parser skipped]
> 
> Anyway, Ethan pointed out that you guys have probably got quite good at this
> sort of thing while developing ZPT...
> 
> So, how should I be approaching this problem?

What *we* did was to rewrite the html parser from the ground up.  You
can download TAL from

  http://www.zope.org/Members/4am/ZPT/TAL-1.2.1.tar.gz/view

and look at HTMLParser.py.

You could also submit a bug report to Python's bug tracker so we can
fix sgmllib in the next release:

  http://sourceforge.net/bugs/?group_id=5470

--Guido van Rossum (home page: http://www.python.org/~guido/)