[ZODB-Dev] Increasing MAX_BUCKET_SIZE for IISet, etc

Thu Jan 27 05:09:59 EST 2011

Hanno Schlichting <hanno <at> hannosch.eu> writes:

> You are using queryplan in the site, right? The most typical catalog
> query for Plone consists of something like ('allowedRolesAndUsers',
> 'effectiveRange', 'path', 'sort_on'). Without queryplan you indeed
> load the entire tree (or trees inside allowedRolesAndUsers) for each
> of these indexes.

Yes we are using queryplan. Without it the site becomes pretty much 
unusable.

> With queryplan it knows from prior execution, that the set returned by
> the path index is the smallest. So it first calculates this. Then it
> uses this small set (usually 10-100 items per folder) to look inside
> the other indexes. It then only needs to do an intersection of the
> small path set with each of the trees. If the path set has less then
> 1000 items, it won't even use the normal intersection function from
> the BTrees module, but use the optimized Cython based version from
> queryplan, which essentially does a for-in loop over the path set.
> Depending on the size ratio between the sets this is up to 20 times
> faster with in-memory data, and even more so if it avoids database
> loads. In the worst case you would load buckets equal to length of the
> path set, usually you should load a lot less.

There still seem to be instances in which the entire set is loaded.  This 
could be an artifact of the fact I am clearing the ZODB cache before each 
]test, which I think seems to be clearing the query plan. Speaking of 
which I saw in the query plan code, some hook to load a pre-defined query 
plan... but I can't see exactly how you supply this plan or in what format 
it is. Do you use this feature?

> We have large Plone sites in the same range of multiple 100.000 items
> and with queryplan and blobs we can run them with ZODB cache sizes of
> less than 100.000 items and memory usage of 500mb per single-threaded
> process.
> 
> Of course it would still be really good to optimize the underlying
> data structures, but queryplan should help make this less urgent.

Well, I think we are already at that point ;) There are also I think other
times in which the full set is loaded.

> > Ahh interesting, that is good to know. I've not actually checked the
> > conflict resolution code, but do bucket change conflicts actually get
> > resolved in some sane way, or does the transaction have to be
> > retried?
> 
> Conflicts inside the same bucket can be resolved and you won't get to
> see any log message for them. If you get a ConflictError in the logs,
> it's one where the request is being retried.

Great. That was that I always thought, but just wanted to check. So in
that case, what does it mean if I see a conflict error for an IISet? Can
they not resolve conflicts internally?

> >> And imagine if you use zc.zlibstorage to compress records! :)
> >
> > This is Plone 3, which is Zope 2.10.11, does zc.zlibstorage work on
> > that, or does it need newer ZODB?
> 
> zc.zlibstorage needs a newer ZODB version. 3.10 and up to be exact.
> 
> > Also, unless I can sort out that
> > large number of small pickles being loaded, I'd imagine this would
> > actually slow things down.
> 
> The Data.fs would be smaller, making it more likely to fit into the OS
> disk cache. The overhead of uncompressing the data is small compared
> to the cost of a disk read instead of a memory read. But it's hard to
> say what exactly happens with the cache ratio in practice.

Yeah, if we could use it I certainly would :) I guess what I mean above is
that larger pickles would compress better, so lots of small pickles the
compression would be less effective.

-Matt