[ZODB-Dev] Space used by IOBTrees

Toby Dickenson tdickenson@geminidataloggers.com
Thu, 27 Feb 2003 23:28:19 +0000


On Thursday 27 February 2003 8:56 pm, Andreas Jung wrote:
> --On Donnerstag, 27. Februar 2003 15:31 -0500 Jeremy Hylton

> >> A XML document with about 26.000 nodes (1.3 MB data) is represented
> >> in an application by a nested structure of nodes where the childs
> >> are stored in IOBTrees (17.000 nodes out of the 26.000 nodes are leafs).
> >> The complete object allocates about 20MB inside the ZODB. analyze.py
> >> shows that there are about 9600 IOBTree objects inside the ZODB with an
> >> average size of 1850 bytes.

Your TreeNode does not derive from Persistent. This means that interior ZODB 
objects are storing:
1. The state for one IOBtree
2. The state for all TreeNode objects referenced by that IOBTree, n of them.
3. References to n other zodb objects the same as this one

I guess you want to make TreeNode derive from Persistent, and to only use an 
IOBTree if the set of childs is large. Certainly dont create one if the set 
is empty.

One other possible bug depending on how the tree is built - you may be storing 
duplicates of your nodes. ZODB can only track object reference aliases if the 
referenced object derives from Persistent, and has its own oid. Your use of 
_p back pointer may be this type of alias (assuming it points to a TreeNode). 
Making TreeNode derive from Persistent will surely fix this.

> > If you want to know what takes up the space in the BTrees pickle, you
> > should probably load an individual database record and look at it.  I
> > think that will give you a better answer than anyone on this list :-).

This is definitely the way to go. Pull up a Data.fs in a hex file viewer, or 
use DirectoryStorage+ls+cat. My experience is that the space-waste patterns 
jump out at you.

-- 
Toby Dickenson
http://www.geminidataloggers.com/people/tdickenson