[ZODB-Dev] Increasing MAX_BUCKET_SIZE for IISet, etc

Thu Jan 27 04:48:34 EST 2011

Hi.

On Thu, Jan 27, 2011 at 9:00 AM, Matt Hamilton <matth at netsight.co.uk> wrote:
> Alas we are. Or rather, alas, ZCatalog does ;) It would be great if it
> didn't but it's just the way it is. If I have 300,000 items in my
> site, and everyone of them visible to someone with the 'Reader'
> role, then the allowedRolesAndUsers index will have an IITreeSet
> with 300,000 elements in it. Yes, we could try and optimize out that
> specific case, but there are others like that too. If all of my
> items have no effective or expires date, then the same happens with
> the effective range index (DateRangeIndex 'always' set).

You are using queryplan in the site, right? The most typical catalog
query for Plone consists of something like ('allowedRolesAndUsers',
'effectiveRange', 'path', 'sort_on'). Without queryplan you indeed
load the entire tree (or trees inside allowedRolesAndUsers) for each
of these indexes.

With queryplan it knows from prior execution, that the set returned by
the path index is the smallest. So it first calculates this. Then it
uses this small set (usually 10-100 items per folder) to look inside
the other indexes. It then only needs to do an intersection of the
small path set with each of the trees. If the path set has less then
1000 items, it won't even use the normal intersection function from
the BTrees module, but use the optimized Cython based version from
queryplan, which essentially does a for-in loop over the path set.
Depending on the size ratio between the sets this is up to 20 times
faster with in-memory data, and even more so if it avoids database
loads. In the worst case you would load buckets equal to length of the
path set, usually you should load a lot less.

We have large Plone sites in the same range of multiple 100.000 items
and with queryplan and blobs we can run them with ZODB cache sizes of
less than 100.000 items and memory usage of 500mb per single-threaded
process.

Of course it would still be really good to optimize the underlying
data structures, but queryplan should help make this less urgent.

> Ahh interesting, that is good to know. I've not actually checked the
> conflict resolution code, but do bucket change conflicts actually get
> resolved in some sane way, or does the transaction have to be
> retried?

Conflicts inside the same bucket can be resolved and you won't get to
see any log message for them. If you get a ConflictError in the logs,
it's one where the request is being retried.

>> And imagine if you use zc.zlibstorage to compress records! :)
>
> This is Plone 3, which is Zope 2.10.11, does zc.zlibstorage work on
> that, or does it need newer ZODB?

zc.zlibstorage needs a newer ZODB version. 3.10 and up to be exact.

> Also, unless I can sort out that
> large number of small pickles being loaded, I'd imagine this would
> actually slow things down.

The Data.fs would be smaller, making it more likely to fit into the OS
disk cache. The overhead of uncompressing the data is small compared
to the cost of a disk read instead of a memory read. But it's hard to
say what exactly happens with the cache ratio in practice.

Hanno