[ZODB-Dev] ZODB idioms

Jeremy Hylton jeremy@zope.com
Mon, 24 Jun 2002 14:06:49 -0400


>>>>> "TP" == Tim Peters <tim@zope.com> writes:

  TP> [Jeremy Hylton]
  >> ...  The particular problem of sequential keys is unfortunate,
  >> but hardly catastrophic.  The BTree uses twice as much space as
  >> it would under ideal conditions.

  TP> Does it really?  People should think about this for real <wink>.
  TP> Classic BTrees have nodes all of exactly the same size in
  TP> memory, and on disk.  Zope's do not.  The space for keys and/or
  TP> values is in separately malloc'ed chunks of memory, and grows
  TP> only as big as needed in each node (well, there's some (I
  TP> suspect) too-aggressive over-allocation, but let's ignore that).

Why should we ignore the over-allocation?  Is that part of thinking
about it for real <wink>?  After reviewing the code, it looks like
over-allocation causes only a temporary problem; unpickling solves the
over-allocation prblem.

A Bucket starts with a minimum allocation of 16 elements and doubles
each time the current size exceeds the allocated size.  For an
IIBucket, the DEFAULT_MAX_BUCKET_SIZE size is 120.  When a BTree tries
to add the 121st k-v pair to an IIBucket, it will resize it.  The
btree_split() code will allocate exactly the memory needed for the new
bucket, but it won't resize the old bucket.

The old bucket has enough allocated memory to hold 128 keys.  After it
is split, it holds 60 keys.  At that point, it's holding 60 key-value
pairs which use 480 (60 * 8 bytes).  But it's allocated memory is 1024
(128 * 8).  There's also 52 bytes of overhead for the object
structure.  So that's 1076 bytes for 60 pairs, or 17.9 bytes per pair.

On the other hand, when the bucket is unpickled, the setstate code
allocates exactly the memory needed.  So the waste is temporary.  As
soon as the bucket gets deactivated and re-loaded, it will be the
right size.

  TP> So if you take two versions of the same tree, one with buckets
  TP> 50% full, and the other with buckets 100% full, the amount of
  TP> memory devoted to holding keys and values is approximately the
  TP> same, and the amount devoted to that in pickles is exactly the
  TP> same.

Not exactly, but mostly.  Each ZODB pickle includes the name of the
module and class as a string.  For Persistence.BTrees.IIBTree.IIBucket, 
that's about 50 bytes of overhead per bucket.  (As you've noted, the
solution there is to change it back to BTrees.IIBTree.IIBucket.)
That's about 10-15% overhead in pickle size.

  TP> What differs is the number of bucket and BTree nodes.  The II
  TP> flavor of trees have bucket capacities of 120 key/value pairs,
  TP> so it's the difference between N/120 and N/60 bucket nodes.
  TP> It's about 1% either way <wink>, and the one with the smaller
  TP> buckets may well be *less* likely to lead to conflicts (picture
  TP> the extreme: if the tree stuffed everything into just one
  TP> bucket, any pair of chnages would conflict).

Indeed.  I don't understand how conflicts affect BTree performance.  I
guess I'll just have to get used to it.

Jeremy