[ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-bytes)

Jim Fulton jim at zope.com
Mon May 10 18:18:08 EDT 2010


On Mon, May 10, 2010 at 5:39 PM, Ryan Noon <rmnoon at gmail.com> wrote:
> First off, thanks everybody.  I'm implementing and testing the suggestions
> now.  When I said ZODB was more complicated than my solution I meant that
> the system was abstracting a lot more from me than my old code (because I
> wrote it and new exactly how to make the cache enforce its limits!).
>
>> > The first thing to understand is that options like cache-size and
>> > cache-size bytes are suggestions, not limits. :)  In particular, they
>> > are only enforced:
>> >
>> > - at transaction boundaries,
>
> If it's already being called at transaction boundaries how come memory usage
> doesn't go back down to the quota after the commit (which is only every 25k
> documents?).

Because Python generally doesn't return memory back to the OS. :)

It's also possible you have a problem with one of your data
structures.  For example if you have an array that grows effectively
without bound, the array will have to be in memory, no matter how big
it is.  Also, if the persistent object holding the array isn't seen as
changed, because you're appending to the array, then the size of the
array won't be reflected in the cache size. (The size of objects in
the cache is estimated from their pickle sizes.)

I assume you're using ZODB 3.9.5 or later. If not, there's a bug in
handling new objects that prevents cache suggestions from working
properly.

If you don't need list semantics, and set semantics will do, you might
consider using an BTrees.LLBtree.TreeSet, which provides compact
scalable persistent sets.  (If your word ids can be signed, you could
ise the IIBTree variety, which is more compact.) Given the variable
name is wordset, then I assume you're dealing with sets. :)

What is wordid_to_docset? You don't show it's creation.

Jim

--
Jim Fulton


More information about the ZODB-Dev mailing list