[ZPT] OT (and probably a bit long ;-) HTML Filtering

Chris Withers chrisw@nipltd.com
Thu, 17 May 2001 22:48:59 +0100


> What *we* did was to rewrite the html parser from the ground up.  You
> can download TAL from
>
>   http://www.zope.org/Members/4am/ZPT/TAL-1.2.1.tar.gz/view
>
> and look at HTMLParser.py.

Yay! :-) This parser did the job absolutely spot on. I had to change the
name of two methods and the base class I was inheriting from and it worked
out of the box :-))
(OK, I also had to wrap some methods to catch HTMLParseExceptions but it was
pretty trivial, checkout the next Strip-O-Gram release if you're interested
:-)

Any chance of this parser making it into the standard python distro?
To be honest, it looks much more useful than sgmllib.py :-S

That's not so important, however, I would like to bundle HTMLParser.py with
both the Squishdot and Strip-O-Gram distributions. I noticed the file
doesn't have any copyright in the header. Should I add the ZPL to the top or
just leave it as it is?

> You could also submit a bug report to Python's bug tracker so we can
> fix sgmllib in the next release:
>
>   http://sourceforge.net/bugs/?group_id=5470

Done already ;-)

Anyway, thanks to everyone involved in writing that parser, it's made my
life a _lot_ easier :-)))

cheers,

Chris