[Zope] HTML parsers and Wget like function

Paul Winkler pw_lists at slinkp.com
Thu Jul 1 11:11:31 EDT 2004


there's also KebasData, although I don't think it does much in the
way of rewriting of the retrieved content.

Warning though - with any of these solutions, you will want to
test what happens when the remote resource is unavailable -
e.g. very slow to respond, or blocked by a firewall, etc.

For example:
I had an external method using urllib2 to retrieve data from
another server and embed it in a zope page. This worked fine
until something went wonky on the network and requests to the
remote page would never yield any response. The result was that
requests to my zope page would hang forever. And apparently
urllib2 blocks while waiting for a response, so once there were
a few requests to this page I had all my worker threads blocked there.
zope was effectively dead.
I used the "Debug spinning zope" recipe to diagnose
that all the threads were waiting in urllib2.

I changed this to instead use LocalFS pointing at copies
of the data on the hard drive, which are updated periodically
via cron & wget. A quick hack but it fixed the symptom.

This was all zope 2.6.2 / python 2.1.3.
Now in python 2.3 you can set timeouts via socket.setdefaulttimeout() 
and this should (I hope) affect urllib2, but I have not tested it.


On Thu, Jul 01, 2004 at 11:21:07PM +1000, Anthony Baxter wrote:
> On Thu, 01 Jul 2004 20:02:02 +0900, Grant Morganryuuguu
> <grant at ryuuguu.com> wrote:
> > 
> > 
> > I am considering Zope/python for a project and would like to get some pointers to see if this is a reasonable fit.
> > I need to get a URL from the web, parse the HTML ,extract some data from the page, rewrite the <a href> tags and display it on the website.
> > I found the HTML parser in library http://www.python.org/doc/current/lib/markup.html and
> > http://www.crummy.com/software/BeautifulSoup/ (which is down now but was up a couple of days ago)
> 
> 
> BeatifulSoup, ClientCookie and ClientForm together make a very very nice
> webscraping package.
> _______________________________________________
> Zope maillist  -  Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - 
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )

-- 

Paul Winkler
http://www.slinkp.com


More information about the Zope mailing list