[ZODB-Dev] ZEO client leaking memory?

Toby Dickenson tdickenson@geminidataloggers.com
Wed, 10 Oct 2001 10:26:48 +0100


On Tue, 09 Oct 2001 17:33:54 +0100, Chris Withers <chrisw@nipltd.com>
wrote:

>Toby Dickenson wrote:
>>=20
>> Ive spent may weeks trying to understand how ZODB behaves in this type
>> of situation. The whole system behaviour when you need to touch many
>> objects in the database is one area where ZODB doesnt work well
>> out-of-the-box without some tuning.
>
>Hurm, where can I learn how to do this tuning?

Read the source, and meditate.

>> That would remove wasted disk space, but not wasted memory. If adding
>> a document wastes that much disk space then I suspect a better
>> solution is to improve the adding-a-document implementation.
>
>Well, it's just BTrees changing, so maybe Jim could explain more how =
they
>behave.

Ahhhh! Yes, BTrees do have a disk space overhead due to their
optimisation for incremental, not bulk indexing.

>In my test rigs, I found pack was the only thign which reduced the _RAM_=
 used,
>bizarre, I know.

yes.

>> That means to remove all objects from the cache associated with _p_jar
>> that have not been touched in three seconds. Is that what you
>> intended?
>
>Yup.

Ive got no reason to think that this wouldnt do what was intended.....
it should control memory usage (although, as I say, in a fairly heavy
handed way)

>> Im guessing on the numbers here, but I suspect adding a:
>>=20
>>    get_transaction().commit(1)
>>    self._p_jar.cacheGC()
>>=20
>> every 10 documents would be better.
>
>Interesting...
>
>> 1. What are your ZODB cache settings (size and time)
>
>Dunno, whatever they are when you do:
>import Zope
>...and there's a custom_zodb.py lying around with a ClientStorage =
specified in
>it...

The default ZODB cache parameters are:

* a size of 400 objects.

That means incremental garbage collection only seriously kicks in when
there are at least 400 objects in the cache. In a well tuned system
under moderate memory pressure (in my experience) I would expect each
cache to reach an equilibrium size of up to 2 or 3 times larger than
this.

* a time of 60 seconds

That means that, when scanning the cache for old objects, it will
remove those that have not been accessed in 60 seconds. This time is
proportionally reduced as the cache size increases.

However, these parameters are not having any effect since you are not
using Zope and currently not calling incrGC manually.

If you do switch to using it then I think these values will be roughly
right for you. If you see cache sizes growing above 1200 then reduce
the time value. (its not very sensitive; I would try 10s next)

>> _p_jar.cacheMinimize() is a fairly heavy-handed way of controlling
>> memory usage; Adding a sprinkling of _p_jar.cacheCG() in code that
>> moves many objects into memory is a better way to deal with runaway
>> memory usage: cacheCG() will do "just enough" work when memory usage
>> grows, and very little work when memory usage is acceptable.
>
>Can you explain the differences?

Calling cacheMinimize moves _everything_ out of memory. Thats
obviously a performance penalty when you want to access the same
objects again, so you only want to call it infrequently (if ever; I
dont think I have ever found a need for this outside of debugging)

The problem with calling it infrequently is that memory usage builds
up between calls. The numbers you give below should let us estimate
how big your bulge will be.

incrGC is designed to *maintain* a sensible pool of recently used
objects in memory, and not let that pool grow too large. It can only
do its job if it is called frequently.


>> 2. How many ZODB objects make up a 'document'
>the documents aren't stored in the ZODB, just indexed, using about 4-10 =
BTrees,
>IIRC.
>
>> 3. How much memory is used by a 'document'
>
>How wouldI measure or work that out?

Being BTrees makes it hard to quantify because not every document will
take up the same space. (compared to, for example, if you were storing
30000 50k gifs. In that case each 'document' takes a little over 50k).

=46or the sake of this illustration I am assuming that one of these is a
full text index of a typically 10k document, and the rest are indexes
of small properties (title, etc). A *very* rough estimate is that
these may take up 20k of ram per document.

You said that you are calling the garbage collector every 5000
documents, so that makes 5000*20k =3D 100M of memory bloat.

Damn. thats not enough to explain all of what you are seeing.


>> 5. Have you checked the size of the ZODB caches during this problem?
>How can I do that?
>> 6. Have you checked the reference count debugging page
>Can't do that, there's no HTTP process in this ZEO client ;-)

Have a look at the source for the relevant bits of the zope management
pages. Finding out exactly where the memory is going will be a big
clue.

>> 7. Have you any mounted databases?
>Nope...

OK, that eliminates the possibility you are packing the wrong one ;-)


Toby Dickenson
tdickenson@geminidataloggers.com