[ZODB-Dev] Memcache or ZEO cache (Re: ZEO and relstporage performance)
Shane Hathaway
shane at hathawaymix.org
Wed Oct 14 14:48:48 EDT 2009
Jim Fulton wrote:
> The most complicated logic in the ZEO cache, which would be just as
> complicated with another cache storage implementation and more
> complicated with a shared cache storage is making sure the cache
> doesn't have stale state. I probably need to look at memcache again,
> but every time I look at it, it's not at all clear how to prevent
> reading stale data as current. At some point, I should look at the
> approach you took In relstorage.
Indeed, that's a hard enough problem that it's making me reconsider
memcached. I suspect I could adopt the ZEO cache in RelStorage.
RelStorage currently uses memcached in a very simple way: it puts (tid,
state) in the cache for each oid. When RelStorage reads from the cache,
if it gets a state for a different transaction than the transaction ID
last polled by the storage instance, RelStorage discards the state from
the cache and falls back to the database. That means most of the cache
needs to be revalidated after every commit. :-( The strategy should
work well for databases that change rarely, but will only add overhead
for databases that change often. There is an attempt to improve the
situation with backpointers, but I doubt they actually help.
I'm tinkering with the idea that some transaction awareness could be
added to memcached. Perhaps memcached should hold a "current
transaction" value and clients should pass their own "current
transaction" value when they try to set data in the cache. If a stale
client tries to set data, memcached should ignore the attempt.
>> and memcached creates
>> opportunities for developers to be creative with caching strategies.
>
> How so?
Well, memcached has a very simple interface, so developers should be
able to craft their own memcached-like implementations. They might add
multi-level caching, for example.
> The biggest problem with ZEO performance on the client side is that
> reads require round trips and that generally a client thread only
> knows to request one read at a time [1]_. I plan to add an API for
> asynchronous reads. In rare situations in which an application knows
> it's going to need more than one object, it can prefetch multiple
> objects at once. (One can imagine iteration scenarios in which this
> would be easy to predict.) An opportunity that this would provide
> would be to pre-fetch object revisions for objects that were in the
> ZODB cache and have just been invalidated.
When ZODB is unpickling the state of an object, it often has to pull in
several objects, one at a time. I wonder if it would be valuable to
prefetch the objects that will be pulled in by the unpickling operation.
We could use the referencesf() function to get the list of OIDs.
Alternatively, I wonder if it would be valuable to store a list of
referenced OIDs in every object. We might put that list in another
pickle, placing it before the "class" and "state" pickles that we
currently store.
Shane
More information about the ZODB-Dev
mailing list