[ZODB-Dev] Space used by IOBTrees

Andreas Jung andreas@andreas-jung.com
Thu, 27 Feb 2003 21:56:08 +0100


--On Donnerstag, 27. Februar 2003 15:31 -0500 Jeremy Hylton=20
<jeremy@zope.com> wrote:

> On Thu, 2003-02-27 at 15:12, Andreas Jung wrote:
>> A XML document with about 26.000 nodes (1.3 MB data) is represented
>> in an application by a nested structure of nodes where the childs
>> are stored in IOBTrees (17.000 nodes out of the 26.000 nodes are leafs).
>> The complete object allocates about 20MB inside the ZODB. analyze.py
>> shows that there are about 9600 IOBTree objects inside the ZODB with an
>> average size of 1850 bytes. The amount of data stored as attributes
>> of a single treenode instance is very small (not more than 100-200
>> bytes). So why is the pickle of an IOBTree nearly 2KB large instead of
>> several hundred bytes?
>
> It's hard to follow the details here.  Concrete examples might help a
> lot to convey what it is you are doing.

The base class is TreeNode and implements the tree handling.
There is also another mix-in class that just represent
attributes and #PCDATA parts of a tag.


class TreeNode:

    def __init__(self, id):
        self.id =3D id
        self._c=3D IOBTree()   # childs
        self._p =3D None       # parent

    def addChild(self, node):
        """ neues Objekt als Kindknoten anh=E4ngen """
        node._p =3D self

        self._c[ len(self) ] =3D node

>
> If you want to know what takes up the space in the BTrees pickle, you
> should probably load an individual database record and look at it.  I
> think that will give you a better answer than anyone on this list :-).
> We can probably help you read the pickle if it doesn't make sense.  The
> pickletools module that's new in Python 2.3a2 would also help.

Will the pickletools work with Python 2.2?

>
> My first guess would be that the pickled representation of whatever
> objects you're storing as the values of the BTree uses more space than
> you think.  Also recall that each BTree node is stored as a separate
> database record, so you don't get to share a memo across records.  For
> example, =3Dthe fully qualified name of the class for the value object is
> stored in every database record.

Most informations are stored in the leafs. The nodes inside the tree
don't carry much informations. Is there a better way to organize such
a nested BTree datastructure to a) reduce the memory usage and b) to keep
the ability to save only the changed parts of the tree instead of storing
the complete tree for every change.

Thanks,
Andreas