[ZODB-Dev] working with large databases
Greg Landrum
greglandrum@earthlink.net
Wed, 27 Feb 2002 17:40:51 -0800
I've got a good sized collection of objects that I'd like to be able to
work with using ZODB and I'm running into problems doing that using
reasonable amounts of memory. I'm looking for patterns that I can use and
am happy to take whatever pointers I can get.
The data set is a collection of 250K objects which require on the order of
a GB to store in memory. I'm storing them using a FileStorage. I've
currently got the objects in a BTree because building the database using a
flat structure (i.e. root[id] = obj) took prohibitively long (commit time
seems to increase very rapidly with DB size).
Now I would like to query those objects. The query consists of passing
each object through a matching function and saving the object if it
matches. Conceptually, this is something like:
for entry in BTree:
if fn(entry): res.append(entry)
The problem I'm encountering is that all of the spellings I can think of
result in the entire BTree ending up in memory. This is not at all what
I'd like to have happen.
Is there a way to loop over the contents of a BTree without having the
entire tree end up in memory?
I'm currently using the ZODB version that comes with Zope 2.4.0 under
Win2K, but I'm willing to switch to other versions. (though I'd rather not
go to python 2.2 just yet).
Thanks,
-greg
----
greg Landrum (greglandrum@earthlink.net)
Software Carpenter/Computational Chemist