[ZODB-Dev] ZEO client leaking memory?

Toby Dickenson tdickenson@geminidataloggers.com
Thu, 04 Oct 2001 11:16:13 +0100


On Thu, 04 Oct 2001 10:00:18 +0100, Chris Withers <chrisw@nipltd.com>
wrote:

>Hi,
>
>Just to let you guys know, I've noticed that my =
lets-index-30,000-documents ZEO
>client appears to be leaking memory. After doing about 10K docs, the ZEO
>_client_ process has sucked so much memory that the machien churns to a
>painfully slow statis...
>
>This, of course, could be a myriad of different things. Anyone got any =
clues on
>how I can find out what's going on?

Ive spent may weeks trying to understand how ZODB behaves in this type
of situation. The whole system behaviour when you need to touch many
objects in the database is one area where ZODB doesnt work well
out-of-the-box without some tuning.


Earlier chris wrote:

>app._p_jar.db()._storage.pack(time(),referencesf, wait=3D1)
>
>...every 5000.

That would remove wasted disk space, but not wasted memory. If adding
a document wastes that much disk space then I suspect a better
solution is to improve the adding-a-document implementation.


>'m still struggling to index 30,000 documents from a ZEO client into a
>FileStorage despite chucking in a:
>
>get_transaction().commit()
>
>...every 600 documents

This is in Zope? You might want to make that a subtransaction commit
to keep with Zopes assumptions that full transactions start and end on
request buondaries:

get_transaction().commit(1)

This may also remove your need to pack the database during the work.
Any wasted disk space is wasted in temporary files (for the
subtransaction data), only the final copy of each object gets written
to the database in the full commit.

I suspect that is irrelevant to this problem, however.


>app._p_jar.cacheMinimize(3)
>
>...every 600 documents

That means to remove all objects from the cache associated with _p_jar
that have not been touched in three seconds. Is that what you
intended?

_p_jar.cacheMinimize() is a fairly heavy-handed way of controlling
memory usage; Adding a sprinkling of _p_jar.cacheCG() in code that
moves many objects into memory is a better way to deal with runaway
memory usage: cacheCG() will do "just enough" work when memory usage
grows, and very little work when memory usage is acceptable.

The trick is finding how many calls to cacheGC is are needed, because
that function isnt very good at working out how much work is "just
enough"

Im guessing on the numbers here, but I suspect adding a:

   get_transaction().commit(1)  =20
   self._p_jar.cacheGC()

every 10 documents would be better.


some other questions:

1. What are your ZODB cache settings (size and time)
2. How many ZODB objects make up a 'document'
3. How much memory is used by a 'document'
4. How fast are documents being added (number per minute)
5. Have you checked the size of the ZODB caches during this problem?
6. Have you checked the reference count debugging page
   during this problem?
7. Have you any mounted databases?


Toby Dickenson
tdickenson@geminidataloggers.com