[ZODB-Dev] working with large databases

Wed, 27 Feb 2002 17:40:51 -0800

I've got a good sized collection of objects that I'd like to be able to 
work with using ZODB and I'm running into problems doing that using 
reasonable amounts of memory.  I'm looking for patterns that I can use and 
am happy to take whatever pointers I can get.

The data set is a collection of 250K objects which require on the order of 
a GB to store in memory.  I'm storing them using a FileStorage.  I've 
currently got the objects in a BTree because building the database using a 
flat structure (i.e. root[id] = obj) took prohibitively long (commit time 
seems to increase very rapidly with DB size).

Now I would like to query those objects.  The query consists of passing 
each object through a matching function and saving the object if it 
matches.  Conceptually, this is something like:

   for entry in BTree:
      if fn(entry): res.append(entry)

The problem I'm encountering is that all of the spellings I can think of 
result in the entire BTree ending up in memory.  This is not at all what 
I'd like to have happen.

Is there a way to loop over the contents of a BTree without having the 
entire tree end up in memory?

I'm currently using the ZODB version that comes with Zope 2.4.0 under 
Win2K, but I'm willing to switch to other versions.  (though I'd rather not 
go to python 2.2 just yet).

Thanks,
-greg
----
greg Landrum (greglandrum@earthlink.net)
Software Carpenter/Computational Chemist