[Zope-dev] Re: Zope Mailing Lists and ZCatalog

Michel Pelletier michel@digicool.com
Fri, 04 Aug 2000 10:48:39 -0700


Andy Dawkins wrote:
> 
> Michel
> 
> In case you are not aware, we at NIP currently host a complete archive of
> the Zope mailing lists that are publicly available.

Yep.
 
> We are using ZCatalog to index all the messages from the Mailing list
> archives.  To give you an idea of numbers, the Zope mailing list alone is
> over 30,000 messages.

> The problem we have is getting that many objects in to the Catalog.  If we
> load the objects in to the ZODB, then catalog them, the machine either runs
> out of memory or, if we lower the sub transactions, It runs out of hard
> drive space.

This is because you are indexing more content than you have virtual+tmp
memory to store the transaction in.  Zope is transaction, as I'm sure
you know, so it has to store the transaction somewhere so it can roll it
back if neccesary, and memory+tmp storage is where that goes
(subtransactions are swapped out to tmp).
 
> If we use CatalogAware to catalog the objects as they are imported the
> Catalog explodes to stupid sizes because CatalogAware doesn't support Sub
> transactions.

Subtransactions are a storage thing, and really don't have anything to
do with catalogaware, if you have a subtransaction threshold set then
subtransactions will be used for any cataloging operation, catalogaware
or not.
 
> We could solve these issues by regularly packing the database during the
> import, but it isn't a perfect solution.

I'm not sure what you mean with these last to paragraphs, it seems like
you have two problems:

1) you are mass indexing and running out of memory

2) you are indexing lots of content quickly and your database is growing

The answer to 1 is to not mass index and incrimentatly index over time. 
The answer to 2 is to use a storage that does not store old revisions,
like berkeley storage.
 
> Also as messages arrived over time the Catalog would once again explode
> dramatically,

> Basically we(NIP) would like to know if you(Michel/DC) are planning to
> improve ZCatalog/CatalogAware, if you are planning a successor to ZCatalog
> or basically any information that could be useful to us regarding the
> current development and urgency of ZCatalog/CatalogAware.

There isn't anything wrong with the Catalog (for this particular
problem), or at least, there isn't anything in the catalog to fix that
would solve your problem.  We've had customers index well over 50,000
objects; you just have to understand the resource constraints and work
with them, for example, don't mass index, use storages that scale to
high write environments, etc.

> Thanks in advance for your assistance.

NP.

-Michel