[ZODB-Dev] Re: ZODB Benchmarks

Sun Feb 3 02:15:48 EST 2008

On Sat, 2008-02-02 at 22:10 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-2-1 21:17 +0200:
> >I have completed my first round of benchmarks on the ZODB and welcome
> >any criticism and advise. I summarised our earlier discussion and
> >additional findings in this blog entry:
> >http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks
> 
> In your insertion test: when do you do commits?
> One per insertion? Or one per n insertions (for which "n")?

I have tried different commit intervals. The published results are for a
commit interval of 100, iow 100 inserts per commit.

> Your profile looks very surprising:
> 
>   I would expect that for a single insertion, typically
>   one persistent object (the bucket where the insertion takes place)
>   is changed. About every 15 inserts, 3 objects are changed (the bucket
>   is split) about every 15*125 inserts, 5 objects are changed
>   (split of bucket and its container).
>   But the mean value of objects changed in a transaction is 20
>   in your profile.
>   The changed objects typically have about 65 subobjects. This
>   fits with "OOBucket"s.

It was very surprising to me too since the insertion is so basic. I
simply assign a Persistent object with 1 string attribute that is 1K in
size to a key in a OOBTree. I mentioned this earlier on the list and I
thought that Jim's explanation was sufficient when he said that the
persistent_id method is called for all objects including simple types
like strings, ints, etc. I don't know if it explains all the calls that
add up to a mean value of 20 though. I guess the calls are being made by
the cPickle module, but I don't have the experience to investigate this.

> Lookup times:
> 
> 0.23 s would be 230 ms not 23 ms.

Oops my multiplier broke ;-)

> 
> The reason for the dramatic drop from 10**6 to 10**7 cannot lie in the
> BTree implementation itself. Lookup time is proportional to
> the tree depth, which ideally would be O(log(n)). While BTrees
> are not necessarily balanced (and therefore the depth may be larger
> than logarithmic) it is not easy to obtain a severely unbalanced
> tree by insertions only.
> Other factors must have contributed to this drop: swapping, cache too small,
> garbage collections...

The cache size was set to 100000 objects so I doubt that this was the
cause. I do the lookup test right after I populate the BTree so it might
be that the cache and memory is full but I take care to commit after the
BTree is populated so even this is unlikely.

The keys that I lookup are completely random so it is probably the case
that the lookup causes disk lookups all the time. If this is the case,
is 230ms not still to slow?

> Furthermore, the lookup times for your smaller BTrees are far too
> good -- fetching any object from disk takes in the order of several
> ms (2 to 20, depending on your disk).
> This means that the lookups for your smaller BTrees have
> typically been served directly from the cache (no disk lookups).
> With your large BTree disk lookups probably became necessary.

I accept that these lookups all all served from cache. I am going to
modify the lookup test so that I close the database after population and
re-open it when starting the test to make sure nothing is cached and see
what the results look like.

Thanks for your insightful comments!

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za