[Zope] searching and serving large textfiles ~120 Mb

Paul Winkler pw_lists at slinkp.com
Fri Dec 5 10:48:25 EST 2003


On Fri, Dec 05, 2003 at 10:31:08AM +0100, Sebastian Krollmann wrote:
> Hi zopistas,
> 
> I need to access large textfiles (~120Mb) from zope. I know the python lager
> file support and that it is better to keep large files out of the ZODB.
> I have to do a full text search on these files residing in a folder hierachy
> on the server, show their content around the location of the found string
> and allow scrolling in that files source from zope.
> 
> Has anybody done something similar to this with that lager files and would
> share his experiences?
> Are there any do's and don'ts or best ways to do it?

I think you will find that serving a 120 mb object through zope
will cripple your performance. Zope is reeeeallly slow with large
chunks of data. A couple of concurrent downloads of 100 MB files
can cause your site to crawl for all users.

However, there are a couple of ways you could store and index the
text files in zope but avoid having the users hit zope to
download them.  


I'm experimenting with FSCacheManager (downloadable 
from cvs on collective.sf.net) which does
"funky caching" in conjunction with an apache rule. 
Apache tries to serve the file directly
from the filesystem. If it doesn't exist, apache then forwards
the request to zope.  The FSCacheManager causes the file to
be stored to the filesystem each time it's hit in zope.
Once a file is on the filesystem, zope won't see further requests
for it.  This works fine and it's very easy to set up.
The big limitation is that, once the file is on the filesystem,
it's available to all ... zope authorization is never checked 
again. Also you can't really control life of the cache but
that may not be an issue.

You could do something similar with Squid filesystem cacheing,
which IIRC can be configured to request authorization from zope
each time someone downloads the file, and clean out the
cache according to some policy.

Of course, you'll need a lot of disk space either way, 
but who cares?

In either case, the first download will still be slow
but you can prevent that by using wget or similar to
"prime" the cache during off-hours.

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's FLYING ACTION HERO!
(random hero from isometric.spaceninja.com)



More information about the Zope mailing list