[ZODB-Dev] ZEO client leaking memory?

Chris Withers chrisw@nipltd.com
Tue, 09 Oct 2001 17:33:54 +0100


Toby Dickenson wrote:
> 
> Ive spent may weeks trying to understand how ZODB behaves in this type
> of situation. The whole system behaviour when you need to touch many
> objects in the database is one area where ZODB doesnt work well
> out-of-the-box without some tuning.

Hurm, where can I learn how to do this tuning?

> That would remove wasted disk space, but not wasted memory. If adding
> a document wastes that much disk space then I suspect a better
> solution is to improve the adding-a-document implementation.

Well, it's just BTrees changing, so maybe Jim could explain more how they
behave.
In my test rigs, I found pack was the only thign which reduced the _RAM_ used,
bizarre, I know.

> This is in Zope? You might want to make that a subtransaction commit
> to keep with Zopes assumptions that full transactions start and end on
> request buondaries:

Why? What's a REQUEST boundary in the context that this is just a python script
opening a ZEO connection and indexing a bucketload of documents?

In what way would subtransactions behave differently?

> This may also remove your need to pack the database during the work.
> Any wasted disk space is wasted in temporary files (for the
> subtransaction data), only the final copy of each object gets written
> to the database in the full commit.

Hmmm... now that is interesting... I may have to give it a go...

> That means to remove all objects from the cache associated with _p_jar
> that have not been touched in three seconds. Is that what you
> intended?

Yup.

> _p_jar.cacheMinimize() is a fairly heavy-handed way of controlling
> memory usage; Adding a sprinkling of _p_jar.cacheCG() in code that
> moves many objects into memory is a better way to deal with runaway
> memory usage: cacheCG() will do "just enough" work when memory usage
> grows, and very little work when memory usage is acceptable.

Can you explain the differences?

> Im guessing on the numbers here, but I suspect adding a:
> 
>    get_transaction().commit(1)
>    self._p_jar.cacheGC()
> 
> every 10 documents would be better.

Interesting...

> 1. What are your ZODB cache settings (size and time)

Dunno, whatever they are when you do:
import Zope
...and there's a custom_zodb.py lying around with a ClientStorage specified in
it...

> 2. How many ZODB objects make up a 'document'
the documents aren't stored in the ZODB, just indexed, using about 4-10 BTrees,
IIRC.

> 3. How much memory is used by a 'document'

How wouldI measure or work that out?

> 4. How fast are documents being added (number per minute)

As fast as they happen, it's just a for loop. Prolly about 1 a second, but this
slows down a _lot_ when the machine runs out of memory ;-)

> 5. Have you checked the size of the ZODB caches during this problem?

How can I do that?

> 6. Have you checked the reference count debugging page

Can't do that, there's no HTTP process in this ZEO client ;-)

> 7. Have you any mounted databases?

Nope...

cheers,

Chris