[ZODB-Dev] ZODB idioms
Tim Peters
tim@zope.com
Mon, 24 Jun 2002 14:37:18 -0400
[Jeremy Hylton]
> Why should we ignore the over-allocation? Is that part of thinking
> about it for real <wink>? After reviewing the code, it looks like
> over-allocation causes only a temporary problem;
Because I don't know that it causes any real problem, and ...
> unpickling solves the over-allocation prblem.
... if it does cause some problem, there you go. Staring at the details of
the B-Tree code can drive you mad; but, in the end, it's unclear exactly
what (if any) of it matters.
> A Bucket starts with a minimum allocation of 16 elements and doubles
> each time the current size exceeds the allocated size. For an
> IIBucket, the DEFAULT_MAX_BUCKET_SIZE size is 120. When a BTree tries
> to add the 121st k-v pair to an IIBucket, it will resize it. The
> btree_split() code will allocate exactly the memory needed for the new
> bucket, but it won't resize the old bucket.
Yes.
> The old bucket has enough allocated memory to hold 128 keys. After it
> is split, it holds 60 keys. At that point, it's holding 60 key-value
> pairs which use 480 (60 * 8 bytes). But it's allocated memory is 1024
> (128 * 8). There's also 52 bytes of overhead for the object
> structure.
Plus the overhead is larger in Zope3 than in Zope2, right? (I'm thinking
about the weakref list in Zope3 BTrees -- or something like that; I don't
understand that code yet.)
> So that's 1076 bytes for 60 pairs, or 17.9 bytes per pair.
Plus hidden malloc overheads. Plus in most IIBTrees I've seen so far,
reserving 4 bytes per integer wastes at least half of those bytes too, as
the ints generally fit in 2 bytes.
> On the other hand, when the bucket is unpickled, the setstate code
> allocates exactly the memory needed. So the waste is temporary. As
> soon as the bucket gets deactivated and re-loaded, it will be the
> right size.
That's my belief, and why it's hard to care much about the overallocation.
The multiunion code is much less agressive in overallocating, but I don't
know it was worth the bother there either.
> ...
> Not exactly, but mostly.
I'm not worried about a few percent one way or the other. Should I be?
> Each ZODB pickle includes the name of the module and class as a string.
> For Persistence.BTrees.IIBTree.IIBucket, that's about 50 bytes of
> overhead per bucket. (As you've noted, the solution there is to change
> it back to BTrees.IIBTree.IIBucket.)
If that's a real problem (I don't know -- does anyone?), we could surely do
better than that. For example, stick the BTree code at the top level and
give the modules one-character names <wink>. Seriously, I'm sure we could
fiddle Zope's pickle/unpickle layers to recognize brief codes for commonly
pickled classes.
> That's about 10-15% overhead in pickle size.
And could be less than 1%, if it matters enough to bother about.
> ...
> Indeed. I don't understand how conflicts affect BTree performance. I
> guess I'll just have to get used to it.
If we had a use model, we could easily make bad first-order predictions that
are nevertheless uncannily accurate in real life <ahem>.