[ZODB-Dev] Re: ZODB Benchmarks

Wed Oct 31 07:35:15 EDT 2007

On Wed, 2007-10-31 at 10:00 +0000, Laurence Rowe wrote:
> It looks like ZODB performance in your test has the same O(log n) 
> performance as PostgreSQL checkpoints (the periodic drops in your 
> graph). This should come as no surprise. B-Trees have a theoretical 
> Search/Insert/Delete time complexity equal to the height of the tree, 
> which is (up to) log(n).
> 
> So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
> inserts. Instead of inserting into the (B-Tree based) data files at 
> every transaction commit it writes a record to the WAL. This does not 
> require traversal of the B-Tree and has O(1) time complexity. The 
> penalty for this is that read operations become more complex, they must 
> look first in the WAL and overlay those results with the main index. The 
> WAL is never allowed to get too large, or its in memory index would 
> become too big.

Thanks for the explanation. After some profiling I noticed that there
are millions of OID lookups in the index. Increasing the cache size from
400 to 100000 led to more acceptable levels of performance degradation.
I'll post some results later on. Some profiling also showed that there
are huge amount of calls to the persistent_id method of the ObjectWriter
- persisting 10000 objects leads to 1338046 calls to persistent_id. This
seems to have quite a bit of overhead. Profile results attached.

> If you are going to have this number of records -- in a single B-Tree -- 
> then use a relational database. It's what they're optimised for.

The point of the benchmark is to determine what "this number of records"
means and to deduce best practice when working with the ZODB. I would
much rather tell a developer to use multiple B-Trees if he wants to
store this number of records than tell them to use a relational
database. Telling a ZODB programmer to use a relational database is an
insult ;-)

One of the tests that I want to try out next is to insert records
concurrently into different B-Trees.

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za
-------------- next part --------------
Tue Oct 30 20:28:04 2007    /tmp/profile-1.dat

         6108977 function calls (6108973 primitive calls) in 57.280 CPU seconds

   Ordered by: cumulative time
   List reduced from 232 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   57.280   57.280 profile_zodb.py:70(run)
        1    0.000    0.000   57.280   57.280 <string>:1(?)
        1    0.260    0.260   57.280   57.280 profile_zodb.py:24(_btrees_insert)
        1    0.000    0.000   57.280   57.280 profile:0(run())
     1001    0.030    0.000   51.060    0.051 _manager.py:88(commit)
     1001    0.040    0.000   50.990    0.051 _transaction.py:365(commit)
     1001    0.110    0.000   50.730    0.051 _transaction.py:486(_commitResources)
     1001    0.020    0.000   48.060    0.048 Connection.py:496(commit)
     1001    0.220    0.000   48.040    0.048 Connection.py:512(_commit)
     9889    0.940    0.000   47.340    0.005 Connection.py:561(_store_objects)
    20372    0.480    0.000   39.790    0.002 serialize.py:381(serialize)
    20372    0.500    0.000   38.950    0.002 serialize.py:409(_dump)
    40750    7.790    0.000   38.020    0.001 :0(dump)
  1338046   17.560    0.000   30.230    0.000 serialize.py:184(persistent_id)
  2177223    9.150    0.000    9.150    0.000 :0(isinstance)
    20373    1.550    0.000    5.240    0.000 FileStorage.py:631(store)
     2964    0.050    0.000    4.980    0.002 Connection.py:749(setstate)
     2964    0.100    0.000    4.930    0.002 Connection.py:769(_setstate)
     2964    0.080    0.000    4.180    0.001 serialize.py:603(setGhostState)
     2964    0.030    0.000    4.100    0.001 serialize.py:593(getState)