[ZODB-Dev] Increasing MAX_BUCKET_SIZE for IISet, etc

Wed Jan 26 16:18:27 EST 2011

On Wed, Jan 26, 2011 at 3:15 PM, Matt Hamilton <matth at netsight.co.uk> wrote:
> All,
>  I have been doing some performance investigation into a large Plone
> site we have running. The site in question has approx 300,000 items of
> content. Each piece of content is indexed by ZCatalog.
>
> The main thing I was tracking down was the very large number of
> objects being loaded by the ZODB, mostly IISet instances.
>
> The large numebr of instances seems to be caused by a particular usage
> pattern, in various indexes in the Catalog there are a number of
> IITreeSet instances that are used to map, for instance, time ->
> UID. As content items are added, you end up adding monotonically
> increasing values to a set. The result of this is that you end up
> 'leaving behind' loads of buckets (or IISets in the case of an
> IITreeSet) that are half full.
>
> Looking at the BTrees code, I see there is a MAX_BUCKET_SIZE constant
> that is set for the various BTree/Set types, and in the case of an
> IISet it is set to 120. This means, when inserting into a IITreeSet,
> when the IISet gets beyond 120 items it is split and a new IISet
> created. Hence as above I see a lage number of 60 item IISets due to
> the pattern in which these data structures are filled.
>
> So, with up to 300,000 items in some of these IISets, it means to
> iterate over the entire set (during a Catalog query) means loading
> 5,000 objects over ZEO from the ZODB, which adds up to quite a bit of
> latency. With quite a number of these data structures about, means we
> can end up with in the order of 50,000 object in the ZODB cache *just*
> for these IISets!

Hopefully, you're not iterating over the entire tree, but still. :)

> So... has anyone tried increasing the size of MAX_BUCKET_SIZE in real
> life?

We have, mainly to reduce the number of conflicts.

> I understand that this will increase the potential for conflicts
> if the bucket/set size is larger (however in reality this probably
> can't get worse than it is, as currently as the value inserted is 99%
> of the time greater than the current max value stored -- it is a
> timestamp -- you always hit the last bucket/set in the tree).

Actually, it reduces the number of unresolveable conflicts.
Most conflicting bucket changes can be resolved, but bucket
splits can't be and bigger buckets means fewer splits.

The main tradeoff is record size.

> I was going to experiment with increasing the MAX_BUCKET_SIZE on an IISet
> from 120 to 1200. Doing a quick test, a pickle of an IISet of 60 items
> is around 336 bytes, an of 600 items is 1580 bytes... so still very
> much in the realms of a single disk read / network packet.

And imagine if you use zc.zlibstorage to compress records! :)

> I'm not sure how the current MAX_BUCKET_SIZE values were determined,
> but looks like they have been the same since the dawn of time, and I'm
> guessing might be due a tune?

Probably.

> It looks like I can change that constant and recompile the BTree
> package, and it will work fine with existing IISets and just take
> effect on new sets created (ie clear and rebuild the catalog index).
>
> Anyone played with this before or see any major flaws to my cunning plan?

We have.  My long term goal is to arrange things so that you can
specify/change limits by sub-classing the BTree classes.
Unfortunately, that's been a long-term priority for too long.
This could be a great narrow project for someone who's willing
to grok the Python C APIs.

Changing the default sizes for the II ad LL BTrees is pretty straightforward.
We were more interested in LO (and similar) BTrees. For those,
it's much harder to guess sizes because you don't know generally
how big the objects will be, which is why I'd like to make it tunable at the
application level.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton