[Zope] Making lots of external data searchable?

Tres Seaver tseaver@digicool.com
Sat, 02 Dec 2000 11:43:03 -0500


Anselm Lingnau <lingnau@tm.informatik.uni-frankfurt.de> wrote:

> I'm using Zope to re-vamp a web site, one of whose most
> important features is an archive of a reasonably busy mailing
> list, which is accessed using home-grown Perl CGI code. I've
> written Python code to let users browse the archive sorted by
> users, subject etc., but now I'm looking at allowing text
> searches. The »old« instance of the web site used Glimpse and a
> simple CGI script (in Perl) to do this across the whole site
> (including the mail archive) and ideally this would be what I'm
> after for the new version as well.
> 
> However, the mail archive now weighs in at about 45 MB in
> individual text files (one per message), and I don't really see
> myself putting this into the ZODB so I can use ZCatalog.
> ZCatalog, however, looks good for indexing the rest of the site
> (I haven't done this yet). Is there a reasonable way of
> interfacing Glimpse with the Zope searching machinery so I
> could again have one-stop searching of the whole site?  (It
> would probably be straightforward to search just the mail
> archive by calling out to Glimpse and massaging the results.)

You could probably use ZCatalog in conjunction with LocalFS to
accomplish this;  I think LocalFS was recently revved to allow
cataloguing.

Note that the actual mass-indexing process is going to be *painful*,
as ZCatalog is intended to ease incremental indexing.  I think I
would write a script which walked the hierarchy, calling a method
to index one (or a few) messages at a time.  This script might
also need to pack the database at intervals;  the catalog is a
bit space inefficient across mutliple index/reindex operations.

Tres.
-- 
===============================================================
Tres Seaver                                tseaver@digicool.com
Digital Creations     "Zope Dealers"       http://www.zope.org