[ZODB-Dev] cache not minimized at transaction boundaries?

Tim Peters tim at zope.com
Thu Jan 26 14:21:26 EST 2006


[Chris Withers]
> This is with whatever ZODB ships with Zope 2.8.5...

Do:

    import ZODB
    print ZODB.__version__

to find out.

> I have a Stepper (zopectl run on steroids) job that deals with lots of
> big objects.

Can you quantify this?

> After processing each one, Stepper does a transaction.get().commit().

Note that "transaction.commit()" is a shortcut spelling.

> I thought this was enough to keep the object cache at a sane size,

It does not do cacheMinimize().  It tries to reduce the memory cache to the
target number of objects specified for that cache, which is not at all the
same as cache minimization (which latter shoots for a target size of 0).
Whether that's "sane" or not depends on the product of:

    the cache's target number of objects

times:

    "the average" byte size of an object

ZODB has no say of its own about either of those.

> however the job kept bombing out with MemoryErrors, and sure enough it
> was using 2 or 3 gigs of memory when that happened.
>
> I fiddled about with the gc module and found that, sure enough, object
> were being kept in memory. At a guess, I inserted something close to the
> following:
>
> obj._p_jar.db().cacheMinimize()
>
> ...after each 5,000 objects were processed (there are 60,000 objects in
> total)
>
> Lo and behold, memory usage became sane.
>
> Why is this step necessary? I thought transaction.get().commit() every so
> often was enough to sort out the cache...

See above.  For most people it works OK.  If `cn` is the Connection, then

    cn._cache.cache_size is the target number of non-ghost objects
    cn._cache.ringlen() is the current number of non-ghost objects

At a transaction boundary, the cache gc method run tries to make ringlen()
<= cache_size, and that's all.

For example, using all defaults:

>>> ZODB.__version__  # probably the version you're using
'3.4.2'

This loads a million-element OOBTree (the construction of which I won't show
here):

>>> len(t)
1000000

The number of non-ghost objects is then approximately 1e6/15 (the number of
leaf-node OOBuckets in that tree; there are more than that because of
non-leaf interior OOBTree nodes, but the leaf nodes account for the bulk of
it):

>>> cn._cache.cache_size, cn._cache.ringlen()
(400, 67067)

At a transaction boundary, a cache gc pass is run to try to reduce the
number of non-ghost objects to cache_size:

>>> transaction.commit()
>>> cn._cache.cache_size, cn._cache.ringlen()
(400, 400)

So it booted 67067 - 400 = 66667 non-ghost objects.



More information about the ZODB-Dev mailing list