[ZODB-Dev] Memcache or ZEO cache (Re: ZEO and relstporage performance)

Wed Oct 14 14:48:48 EDT 2009

Jim Fulton wrote:
> The most complicated logic in the ZEO cache, which would be just as
> complicated with another cache storage implementation and more
> complicated with a shared cache storage is making sure the cache
> doesn't have stale state.  I probably need to look at memcache again,
> but every time I look at it, it's not at all clear how to prevent
> reading stale data as current.  At some point, I should look at the
> approach you took In relstorage.

Indeed, that's a hard enough problem that it's making me reconsider 
memcached.  I suspect I could adopt the ZEO cache in RelStorage.

RelStorage currently uses memcached in a very simple way: it puts (tid, 
state) in the cache for each oid.  When RelStorage reads from the cache, 
if it gets a state for a different transaction than the transaction ID 
last polled by the storage instance, RelStorage discards the state from 
the cache and falls back to the database.  That means most of the cache 
needs to be revalidated after every commit. :-(  The strategy should 
work well for databases that change rarely, but will only add overhead 
for databases that change often.  There is an attempt to improve the 
situation with backpointers, but I doubt they actually help.

I'm tinkering with the idea that some transaction awareness could be 
added to memcached.  Perhaps memcached should hold a "current 
transaction" value and clients should pass their own "current 
transaction" value when they try to set data in the cache.  If a stale 
client tries to set data, memcached should ignore the attempt.

>> and memcached creates
>> opportunities for developers to be creative with caching strategies.
> 
> How so?

Well, memcached has a very simple interface, so developers should be 
able to craft their own memcached-like implementations.  They might add 
multi-level caching, for example.

> The biggest problem with ZEO performance on the client side is that
> reads require round trips and that generally a client thread only
> knows to request one read at a time [1]_.  I plan to add an API for
> asynchronous reads.  In rare situations in which an application knows
> it's going to need more than one object, it can prefetch multiple
> objects at once.  (One can imagine iteration scenarios in which this
> would be easy to predict.)  An opportunity that this would provide
> would be to pre-fetch object revisions for objects that were in the
> ZODB cache and have just been invalidated.

When ZODB is unpickling the state of an object, it often has to pull in 
several objects, one at a time.  I wonder if it would be valuable to 
prefetch the objects that will be pulled in by the unpickling operation. 
  We could use the referencesf() function to get the list of OIDs.

Alternatively, I wonder if it would be valuable to store a list of 
referenced OIDs in every object.  We might put that list in another 
pickle, placing it before the "class" and "state" pickles that we 
currently store.

Shane