[ZODB-Dev] Memory out of control when NOT changing objects

Tue Oct 12 15:14:45 EDT 2004

[Malcolm Cleaton]
> I'm having a memory problem with a script for updating objects in the
> ZODB.
>
> I'm running on Zope 2.7.3b1, and invoking the script using zopectl run.
>
> The loop looks like this, although some method names have been changed to
> protect the innocent:
>
> results = portal_catalog.searchResults(...) num = 0 for brain in results:
> 	ob = brain.getObject()
> 	if needsChanging(ob):
> 		change(ob)
> 	# else
> 	# 	ob._p_deactivate()
>
> 	num += 1
> 	if num % 1000 == 0:
> 		get_transaction().commit()
>
> With the code as shown, memory usage is fine if the objects actually need
> changing. However, if they don't, it spirals out of control, straight for
> swap death; I suspect the script may eventually finish but I don't have
> the patience.
>
> With the two commented lines uncommented, so objects get explicitly
> deactivated when no longer needed, memory usage is fine.
>
> But, why is this necessary? I thought the only circumstance that would
> bring swap death to the ZODB was a super-sized transaction, full of
> changed objects.

Why would you think that?  In the absence of docs, I'm just curious about
where people get their ideas.

> However, in this case, loading the objects but not changing them appears
> to be unsafe.

An object is loaded from a storage via a Connection.  The Connection has an
in-memory cache (just called "cache" hereafter), holding on to every object
loaded through it.  When Toby replied that "objects do not get deactivated
mid-transaction", he was obliquely referring to that a Connection does
reduce its cache to its target size at the end of a transaction (whether via
commit or abort).

But your example starts a new transaction every 1000 iterations, yet RAM
usage grows without bound anyway, so Toby's response didn't tell you the
whole story here.

The "hidden bit" is that commit is a method of a Transaction object, and
transactions don't know anything about connections.  In the ZODB 3.2 line,
transactions learn only about modified objects.  Transaction.commit() looks
at all the modified objects, and tells the Connection(s) those objects were
loaded from to commit new states for those objects.  Cache reduction in a
connection C happens, in effect, as a side effect of committing (or
aborting) *changes* to objects loaded via C.

So that's the underlying problem:  when no object loaded from a connection C
is modified, the current Transaction never "finds" any modified objects
loaded from C when you do a commit() (or abort()), so the current
transaction never tells C to reduce its cache.  At this point I have to
repeat that transactions in the 3.2 line have no direct knowledge of
connections -- transactions only know about modified objects.

> Is this a bug,

I don't think so.  The code appears to have worked this way forever, and
"worming around it" appears to be more-or-less common folklore knowledge.

> or am I misunderstanding the rules of the ZODB?

That depends on whether you believe the rules of ZODB exist independent of
its implementation <0.9 wink>.  Same thing goes for the workarounds.
Calling ob._p_deactivate(), as you did, works.  Calling the connection's
.sync() method from time to time should also work.  Likewise calling the
connection's .cacheGC() method from time to time, or closing and (re)opening
the connection.  Those I deduced from looking at the implementation.

[ZODB-Dev] Memory out of control when *NOT* changing objects

[ZODB-Dev] Memory out of control when NOT changing objects