[Zope-dev] Re: Zope Mailing Lists and ZCatalog

Kapil Thangavelu kvthan@wm.edu
Mon, 07 Aug 2000 09:18:13 -0700


I've been working on a mailman archive/search interface in zope. I
choose not to do the search mechanisms in zope because I was under the
impression that ZCatalog is great for object indexing but that it would
not be ideal for mass text indexing with 100K+ objects and 100MBs+ of
text.

The comments below seem to indicate that its only problems with mass
indexing and transaction storage which would both get mitigated by
moving to a incremental indexing scheme. 

but wouldn't you run into performance problems on searches and getting
available memory to powerup the catalog search? 

i guess what i'm looking for is a maxim on catalog usage in terms of
number of objects/indexes and a machines specs?

Curious

Kapil


btw a demo of my mailman search interface is at 
http://sindev.dyndns.org/TGrounds/archive_search


Michel Pelletier wrote:
> 
> Andy Dawkins wrote:
> >
> > Michel
> >
> > In case you are not aware, we at NIP currently host a complete archive of
> > the Zope mailing lists that are publicly available.
> 
> Yep.
> 
> > We are using ZCatalog to index all the messages from the Mailing list
> > archives.  To give you an idea of numbers, the Zope mailing list alone is
> > over 30,000 messages.
> 
> > The problem we have is getting that many objects in to the Catalog.  If we
> > load the objects in to the ZODB, then catalog them, the machine either runs
> > out of memory or, if we lower the sub transactions, It runs out of hard
> > drive space.
> 
> This is because you are indexing more content than you have virtual+tmp
> memory to store the transaction in.  Zope is transaction, as I'm sure
> you know, so it has to store the transaction somewhere so it can roll it
> back if neccesary, and memory+tmp storage is where that goes
> (subtransactions are swapped out to tmp).
> 
> > If we use CatalogAware to catalog the objects as they are imported the
> > Catalog explodes to stupid sizes because CatalogAware doesn't support Sub
> > transactions.
> 
> Subtransactions are a storage thing, and really don't have anything to
> do with catalogaware, if you have a subtransaction threshold set then
> subtransactions will be used for any cataloging operation, catalogaware
> or not.
> 
> > We could solve these issues by regularly packing the database during the
> > import, but it isn't a perfect solution.
> 
> I'm not sure what you mean with these last to paragraphs, it seems like
> you have two problems:
> 
> 1) you are mass indexing and running out of memory
> 
> 2) you are indexing lots of content quickly and your database is growing
> 
> The answer to 1 is to not mass index and incrimentatly index over time.
> The answer to 2 is to use a storage that does not store old revisions,
> like berkeley storage.
> 
> > Also as messages arrived over time the Catalog would once again explode
> > dramatically,
> 
> > Basically we(NIP) would like to know if you(Michel/DC) are planning to
> > improve ZCatalog/CatalogAware, if you are planning a successor to ZCatalog
> > or basically any information that could be useful to us regarding the
> > current development and urgency of ZCatalog/CatalogAware.
> 
> There isn't anything wrong with the Catalog (for this particular
> problem), or at least, there isn't anything in the catalog to fix that
> would solve your problem.  We've had customers index well over 50,000
> objects; you just have to understand the resource constraints and work
> with them, for example, don't mass index, use storages that scale to
> high write environments, etc.
> 
> > Thanks in advance for your assistance.
> 
> NP.
> 
> -Michel
> 
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )