[ZODB-Dev] Re: Advice on ZODB with large datasets

Wed Jun 18 13:39:50 EDT 2008

AFoglia at princeton.com wrote:
> We have a large dataset of 650,000+ records that I'd like to examine 
> easily in Python.  I have figured out how to put this into a ZODB file 
> that totals 4 GB in size.  But I'm new to ZODB and very large databases, 
> and have a few questions.
> 
> 1. The data is in a IOBTree so I can access each item once I know the 
> key, but to get the list of keys I tried:
> 
> scores = root['scores']
> ids = [id for id in scores.iterkeys()]
> 
> This seems to require the entire tree to be loaded into memory which 
> takes more RAM than I have.

Does your record class inherit from persistent.Persistent? 650k integers 
+ object pointers should only be of the order 10 Mb or so. It sounds to 
me like the record data is being stored in the btrees bucket directly.

Something like this should lead to smaller bucket objects where the 
record data is only loaded when you access the values of the btree:

 >>> from BTrees.IOBTree import IOBTree
 >>> bt = IOBTree()
 >>> from persistent import Persistent
 >>> class Record(Persistent):
...     def __init__(self, data):
...         super(Record, self).__init__()
...         self.data = data
...
 >>> rec = Record("my really long string data")
 >>> bt[1] = rec

Laurence