[ZODB-Dev] Space used by IOBTrees

Andreas Jung andreas@andreas-jung.com
Fri, 28 Feb 2003 05:44:31 +0100


--On Donnerstag, 27. Februar 2003 23:28 +0000 Toby Dickenson 
<tdickenson@geminidataloggers.com> wrote:

> On Thursday 27 February 2003 8:56 pm, Andreas Jung wrote:
>> --On Donnerstag, 27. Februar 2003 15:31 -0500 Jeremy Hylton
>
>> >> A XML document with about 26.000 nodes (1.3 MB data) is represented
>> >> in an application by a nested structure of nodes where the childs
>> >> are stored in IOBTrees (17.000 nodes out of the 26.000 nodes are
>> >> leafs). The complete object allocates about 20MB inside the ZODB.
>> >> analyze.py shows that there are about 9600 IOBTree objects inside the
>> >> ZODB with an average size of 1850 bytes.
>
> Your TreeNode does not derive from Persistent. This means that interior
> ZODB  objects are storing:
> 1. The state for one IOBtree
> 2. The state for all TreeNode objects referenced by that IOBTree, n of
> them. 3. References to n other zodb objects the same as this one
>
> I guess you want to make TreeNode derive from Persistent, and to only use
> an  IOBTree if the set of childs is large. Certainly dont create one if
> the set  is empty.

Silly me. In fact a former implementation derived from Persistent but
I replaced it because it allocated up to 50MB for the same XML document.
And the implementation does create IOBTrees only on  request. The complete
implementation would have blown up this posting.

>
> One other possible bug depending on how the tree is built - you may be
> storing  duplicates of your nodes. ZODB can only track object reference
> aliases if the  referenced object derives from Persistent, and has its
> own oid. Your use of  _p back pointer may be this type of alias (assuming
> it points to a TreeNode).  Making TreeNode derive from Persistent will
> surely fix this.

Yes, _p points back to a TreeNode.
>
>> > If you want to know what takes up the space in the BTrees pickle, you
>> > should probably load an individual database record and look at it.  I
>> > think that will give you a better answer than anyone on this list :-).
>
> This is definitely the way to go. Pull up a Data.fs in a hex file viewer,
> or  use DirectoryStorage+ls+cat. My experience is that the space-waste
> patterns  jump out at you.

"ls -l Data.fs" did the same job ;-)

Thanks,
Andreas