[ZODB-Dev] ZODB idioms

Mon, 24 Jun 2002 14:37:18 -0400

[Jeremy Hylton]
> Why should we ignore the over-allocation?  Is that part of thinking
> about it for real <wink>?  After reviewing the code, it looks like
> over-allocation causes only a temporary problem;

Because I don't know that it causes any real problem, and ...

> unpickling solves the over-allocation prblem.

... if it does cause some problem, there you go.  Staring at the details of
the B-Tree code can drive you mad; but, in the end, it's unclear exactly
what (if any) of it matters.

> A Bucket starts with a minimum allocation of 16 elements and doubles
> each time the current size exceeds the allocated size.  For an
> IIBucket, the DEFAULT_MAX_BUCKET_SIZE size is 120.  When a BTree tries
> to add the 121st k-v pair to an IIBucket, it will resize it.  The
> btree_split() code will allocate exactly the memory needed for the new
> bucket, but it won't resize the old bucket.

Yes.

> The old bucket has enough allocated memory to hold 128 keys.  After it
> is split, it holds 60 keys.  At that point, it's holding 60 key-value
> pairs which use 480 (60 * 8 bytes).  But it's allocated memory is 1024
> (128 * 8).  There's also 52 bytes of overhead for the object
> structure.

Plus the overhead is larger in Zope3 than in Zope2, right?  (I'm thinking
about the weakref list in Zope3 BTrees -- or something like that; I don't
understand that code yet.)

>  So that's 1076 bytes for 60 pairs, or 17.9 bytes per pair.

Plus hidden malloc overheads.  Plus in most IIBTrees I've seen so far,
reserving 4 bytes per integer wastes at least half of those bytes too, as
the ints generally fit in 2 bytes.

> On the other hand, when the bucket is unpickled, the setstate code
> allocates exactly the memory needed.  So the waste is temporary.  As
> soon as the bucket gets deactivated and re-loaded, it will be the
> right size.

That's my belief, and why it's hard to care much about the overallocation.
The multiunion code is much less agressive in overallocating, but I don't
know it was worth the bother there either.

> ...
> Not exactly, but mostly.

I'm not worried about a few percent one way or the other.  Should I be?

> Each ZODB pickle includes the name of the module and class as a string.
> For Persistence.BTrees.IIBTree.IIBucket, that's about 50 bytes of
> overhead per bucket.  (As you've noted, the solution there is to change
> it back to BTrees.IIBTree.IIBucket.)

If that's a real problem (I don't know -- does anyone?), we could surely do
better than that.  For example, stick the BTree code at the top level and
give the modules one-character names <wink>.  Seriously, I'm sure we could
fiddle Zope's pickle/unpickle layers to recognize brief codes for commonly
pickled classes.

> That's about 10-15% overhead in pickle size.

And could be less than 1%, if it matters enough to bother about.

> ...
> Indeed.  I don't understand how conflicts affect BTree performance.  I
> guess I'll just have to get used to it.

If we had a use model, we could easily make bad first-order predictions that
are nevertheless uncannily accurate in real life <ahem>.