[ZODB-Dev] Investigating a Zope reference leak... tracking objectcreation

Fri Jan 28 11:12:26 EST 2005

[Ben Last]
> I've been trying to track down what's either a memory or a reference leak
> in a Zope 2.7.3 (Python 2.3.4) system (two servers, production one
> running Zeo, development running Zope direct to ZODB).
>
> The symptoms are that after several days of normal running on the
> agressively proxy-cached production site or a few hours on the
> development site under simulated load, the number of OFS.Image.Image
> references reaches very high levels, such as 17761.  In the cache at that
> time are just 4345 images.  There are only 1332 images on the whole site,
> so assuming (as I understand it) that each of the 5 Zope thread has its
> own cache,

That's correct.  On average, you've got about 4 references per cached Image
(17761/4345).  Offhand that doesn't seem notably high.

> there should be a maximum of (1332 * threads) = 6660 Images.
> The site never creates new Images, it just loads existing ones.
>
> In a bid to see where these references are being created, I installed
> LeakFinder, but that doesn't help me because Images are persistent
> objects, and thus their __init__ methods are not called (LeakFinder
> appears to patch just the __init__ and __del__ methods for tracking).
> Thus I've been on a quest to find a place to stick a patch that will show
> me, for a persistent object, from where a reference to it is being
> created.

Sorry, but that isn't possible.  Persistent objects are a kind of Python
object, and for all Python objects A creating a new reference is as simple
as:

>>> B = A   # now B holds a new reference to A

For example,

>>> from sys import getrefcount as r
>>> A = 9283749287
>>> r(A)
2
>>> B = A
>>> r(A)  # and the refcount went up by 1
3

There's nothing you can do to hook the code generated for "B = A" -- it's
just a handful of machine instructions.  That's the way Python works.

BTW, sys.getrefcount is what ZODB's DB.cacheExtremeDetail() uses to report
the number of references.

In Zope 2.8 / ZODB 3.3, the Persistent base class participates in Python's
cyclic garbage collection (it does not before then), and then you'll be able
to use gc.get_referrers(A) to get a list of all container objects that hold
a reference to A.  Jeremy Hylton and I found that somewhat helpful for
diagnosing leaks in ZODB's internals, but it requires ZODB 3.3, and for
complicated reasons it wasn't nearly as helpful as I hoped it would be (in a
nutshell, the leaks were due to refcounting errors in ZODB's C code; I'm
still hoping it will be more helpful for diagnosing higher-level reference
leaks).

> I've tried __setstate__, and overridden it for OFS.Image.Image to print a
> traceback to stdout.  Then I run the development Zope site with
> bin/runzope and watch.  This shows me Images being created in the
> expected places, but not in quantities that would explain those huge
> refcounts.

__setstate__ is used to "flesh out" a ghost object.  It really doesn't have
anything to do with creating new references.  Simply doing "B = A" will
create another reference to A regardless of whether A is a ghost or not; if
A is a ghost, "B = A" won't even unghostify it (and __setstate__ won't be
called).

> So... is there a better place than __setstate__ to identify where
> references to Images are being created?

No, and __setstate__ isn't a good place either.  It's not possible to trap
"refcount increased" for Python objects, unless you recompile Python with
new code for the expansion of the Py_INCREF macro to do so.  I've done that
in extreme cases in the past, but it's difficult and painful, in part
because Py_INCREF is executed at an extremely high dynamic rate, so adding
anything to it can grossly slow execution.

> I don't fully understand the way in which persistent objects are
> actually created

A ghost for A is created when a persistent container that contains A is
first loaded (assuming A isn't already in cache).  This apparent infinite
regress ends at the root object, and all persistent objects are ultimately
reached by loading containers starting from the root object.

> and populated.

Generally only when an attribute of a ghost A is referenced.  Then A's state
has to be materialized, in order to retrieve the attribute's value.

Neither of these have much of anything to do with creating new references,
though.

About the best you can do is look at sys.getrefcount(A) for an object A
you're interested in, at various points in the app, and do a kind of binary
search (moving the refcount instrumentation around) to find out where
unexpected references got created.  This is hard.