[ZODB-Dev] Berkley Transactions slow to commit?

Chris Withers chrisw@nipltd.com
Tue, 30 Oct 2001 14:32:46 +0000


Hi Barry,

Having had more than my fair share of fun'n'games with FileStorage, I'm now
experiencing different fun'n'games with BerkleyStorage ;-)

The problems I face are these:

I need to index about 30,000 documents. I'm doing this using a python script
(not a (script) python ;-)
that imports Zope and hence uses custom_zodb.py to open a Full berkley storage.
I figured doing all 30,000 documents in one transaction wasn't a good idea, so I
was trying do them in batches of 500. After each batch I'd do a
get_transaction().commit().

First problem, I kept on running out of locks doing this. So, I bumped the lock
settings up to:

set_lk_max_locks 1000000
set_lk_max_objects 100000
set_lk_max_lockers 100

...this stopped the error, but the python process chewed through 220Mb of RAM.

...so I dropped it down to only 50 documents per batch and dropped the lock
settings down by a factor of 10.

Now I'm only using 100Mb of memory but still:

- Indexing 50 documents takes, on average, 3 minutes
- calling get_transaction().commit() takes, on average, 13-20 minutes(!!)
- app._p_jar.cacheMinimize(3) takes, on average, 20 seconds.

Here's a snapshot of the top of a top output:

load average: 1.20, 1.16, 1.11
47 processes: 45 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  0.0% user,  4.3% system,  6.1% nice, 89.5% idle
Mem:  899980K av, 896924K used,   3056K free,      0K shrd,   4480K buff
Swap: 2097136K av, 393452K used, 1703684K free                647220K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
13883 root      17   5  117M 117M 17944 R N     0  8.5 13.3 125:53 python2.1

Can you (or anyone else) enlighten me as to what's going on here?
Why is the commit taking so long? How can I speed it up? 

Also, in general, should you try and have a few big transactions or many small
transactions when using BerkleyDB? Does this vary depending on whether you use
Minimal or Full?

Oh, and while I remember, should I use Minimal or Full if I want a simple,
efficient, non-versioning storage?

cheers,

Chris