[ZODB-Dev] duplicate functionality in bsddb3Storage

Barry A. Warsaw barry@zope.com
Tue, 9 Oct 2001 18:59:19 -0400


>>>>> "AD" == Andrew Dalke <adalke@mindspring.com> writes:

    AD>   We ran into the "Lock table is out of available locks"
    AD> problem mentioned in the README.  I couldn't figure out
    AD> what it meant to tweak the DB_CONFIG file so I grepped the
    AD> source code for "set_lk_max" and change the value in
    AD> base.py from 10,000 to 100,000.

Are you using the CVS version of this stuff, the latest StandaloneZODB
release, or the latest bsddb3Storage release?

    AD>   This didn't change anything.  After a while I figure out
    AD> it was because there's duplicate functionality between
    AD> base.py (which has a 'envFromString') and BerkeleyBase.py
    AD> (which has a 'env_from_string').  I had been changing the
    AD> wrong function.

Yup.  base.py is used by Packless, but I want to unofficially
deprecate Packless.  I can't do that officially until Minimal has at
least the same functionality, but that hasn't been a high priority, so
for now there are 3 storages in bsddb3Storage.

This should be easy to trace though, if you're using Full (which I
recommend).  Full derives from BerkeleyBase.BerkeleyBase, which in
turn derives from ZODB.BaseStorage.BaseStorage.  To paraphrase John
Cleese, "base.py don't enter inna it."

That begs the question of where the right place is to tweak the
BerkeleyDB knobs, and it took me a little while to figure out the
right way.  The explanation below assumes the latest CVS snapshot of
bsddb3Storage.

Let's say you instantiate Full by passing "BDB" as the first
argument.  This will cause env_from_string() to create a subdirectory
relative to the current directory called "BDB", and BerkeleyDb will
by default store all its files in this subdirectory.

Now, to control the various BerkeleyDB knobs, create a file inside BDB
called DB_CONFIG.  In this file you can add any set_* directives you
might need.  In your case, something like the following might be
useful:

    set_lk_max_locks 10000
    set_lk_max_objects 10000
    set_lk_max_lockers 3

(You're right that set_lk_max is deprecated.  I've done some moderate
updating to the bsddb3Storage/README to reflect this, but will do more
after my latest round of tuning experiments is complete -- see below.)

    AD>   Since it's confusing, you all might want to think about
    AD> reducing this little bit of duplication in the code.
  
See above for the (moderatly bogus) rationale behind the duplication.
Full is the best supported, and for us, most important storage.
Minimal is what I plan on replacing Packless with.  Packless mostly
works, but isn't supported.

    AD>   Also, since I'm here, the Sleepycat docs say "set_lk_max" is
    AD> deprecated, and this is the function used in base.py and
    AD> hinted at in the README.

Right.  As a side note, BerkeleyBase.py doesn't call set_lk_max()
directly.  We've decided that we won't twist these BerkeleyDB knobs in
the bsddb3Storage code because then we'd have to expose ways of
letting you tune them, and BerkeleyDB already has a good way of
letting you do that.  No need to re-invent the wheel here.

    AD>   BTW, both of these are related to Barry's README comment ]
    AD> Thus, Packless ships with the default number of Berkeley locks
    AD> ] set to 10,000 (BAW: is this still the case, and what about ]
    AD> Full and Minimal?)

    AD> The answers are "yes" (using a deprecated API) and "not for
    AD> Full nor Minimal".

Two BerkeleyDB resources you'll likely want to adjust are the number
of locks and the cache size.

I'm currently experimenting with a migration script that converts a
~6000 transaction FileStorage to a Full storage.  The FileStorage is a
snapshot of activity on zope.org.

Using BerkeleyDB 3.2.9, I found that I had to crank the number of
available locks up to 200,000 in order to get the migration to
complete.  That seemed like a ridiculous number, and I suspected lock
leakage.  This was supported by some behavior that the Subversion
project was seeing.

Fortunately, BerkeleyDB 3.3.11 seems to have cleared all that up for
us, and I believe the Subversion folks too.  So I highly recommend
upgrading to that version, and using PyBSDDB 3.3.x.  My migration
script now requires just under 20,000 locks, 3 lockers, and just over
15,000 lock objects.  That seems much more reasonable.

The second thing to notice is that cache size (i.e. set_cachesize in
DB_CONFIG) is going to greatly affect your overall performance.  In my
migration script, the default cachesize (256KB) caused the first 500
transactions to migrate in about 10 minutes.  Cranking that up to
128MB reduces this time to ~50 seconds.  But be careful!  Setting
cachesize higher than the resources of your machine can have a very
bad effect on performance, especially for long running processes.

Below are some useful urls to read about performance tuning issues.
I'll be updating the README file based on my experience once I've
finished doing my local experiments.

Cheers,
-Barry

    http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
    http://www.sleepycat.com/docs/ref/am_misc/tune.html
    http://www.sleepycat.com/docs/ref/transapp/tune.html
    http://www.sleepycat.com/docs/ref/transapp/throughput.html