[ZODB-Dev] Re: ZODB Caching Questions

Toby Dickenson tdickenson@geminidataloggers.com
Fri, 22 Mar 2002 11:08:19 +0000


On Friday 22 March 2002 10:41 am, Martin Gfeller wrote:

>last October, I sent some questions about ZODB caching, but never got
>any answer. As it is still important to us, and I've noticed some recent
>
>traffic and work in this area, especially from Toby Dickenson,
>I'd like to ask you again:
>
>We're using a number of ZODB databases to store financial deal objects
>and assorted static data objects.
>
>A reference to each object in a database is kept in a 'root' object,
>which is a PersistentMapping.

>1. If objects are referenced from a root object, they can never
>   be deallocated (just deactivated), because at least one
>   reference is always kept. Is this so,

correct

>  and what are alternative
>   ways to do it?

This will be bad using the original cache implementation, because it doesnt 
distinguish between ghost (deactivated) and non-ghost objects. It will always 
see that the cache size is 100001 (100000 deal objects plus the root. maybe 
some more too), theres nothing it can do to reduce the number, so it will 
thrash. To help this you need to use a BTree instead of PersistentMapping, 
However this may still not perform satisfactorily using the old cache.


Under my new cache your current implementation may 'just work'. The cache 
controls the number of non-ghost objects. Lets say you set the target size to 
be 500. It will keep the 500 most recently used objects activated, and 99500 
as ghosts. No thrashing.

Ghosts are tiny, but their overhead is not zero. Maybe you need to consider a 
BTree anyway. I will investigate further if BTree+my cache is not a complete 
solution.

>2. If an object is a simple Python object, instead of being derived
>   from Persistent, cache control never seems to touch it.
>   Is this correct?

Yes. It will be persisted inside every persistent object that references it, 
and removed from memory when the last reference to it is lost.

>3. The cache statistics cache_mean_deal and cache_mean_deac never
>   seem to show anything else than 0.0, despite tracing shows that
>   deactiviations occur.

I dont think those stats ever recorded a useful metric.

One hack I found useful when cache-tuning my application is to change the 
cacheGC and _incrgc functions in Connection.py to record the cache sizes 
before and after each pack.

Under my new cache, the size after will always be the target size. The 
difference (before-target) is the number of objects activated since the last. 
Plotting this as a historgram is a good illustration of memory pressure.

(Hmmm; I might tidy up this patch for Zope 2.6 if I have time)

>4. If we have PersistentMappings of up to 100'000 entries, indexed
>   by a (string,string) tuple, should we use a Btree instead?

See, you already knew the answer.