[ZODB-Dev] zeo.memcache
Shane Hathaway
shane at hathawaymix.org
Wed Oct 12 17:53:19 UTC 2011
On 10/09/2011 08:26 AM, Jim Fulton wrote:
> On Sat, Oct 8, 2011 at 4:34 PM, Shane Hathaway<shane at hathawaymix.org> wrote:
>> On 10/05/2011 11:40 AM, Pedro Ferreira wrote:
>>> Hello all,
>>>
>>> While doing some googling on ZEO + memcache I came across this:
>>>
>>> https://github.com/eleddy/zeo.memcache
>>>
>>> Has anybody ever tried it?
>>
>> Having implemented memcache integration for RelStorage, I now know what
>> it takes to make a decent connection between memcache and ZODB. The
>> code at the link above does not look sufficient to me.
>>
>> I could adapt the cache code in RelStorage for ZEO. I don't think it
>> would be very difficult. How many people would be interested in such a
>> thing?
>
> This would be of broad interest!
>
> Can you briefly describe the strategy? How do you arrange that
> the client sees a consistent view of the current tid for a given
> oid?
(Sorry for not replying sooner--I've been busy.)
As I see it, a cache of this type can take 2 basic approaches: it can
either store {oid: (state, tid)}, or it can store {(oid, tid): (state,
last_tid)}. The former approach is much simpler, but since memcache has
no transaction guarantees whatsoever, it would lead to consistency
errors. The latter approach makes it possible to avoid all consistency
errors even with memcache, but it requires interesting algorithms to
make efficient use of the cache. I chose the latter.
Given the choice to structure the cache as {(oid, tid): (state,
last_tid)}, a simple way to use the cache would be to get the last
committed tid from the database and use that tid for the lookup key.
This would be extremely efficient until the next commit, at which point
the entire cache would become irrelevant and would have to be rebuilt.
Therefore, most of the interesting parts of the cache code in RelStorage
are focused on simply choosing a good tid for the cache lookup operation.
It caches the following things in memcache:
1. A pair of checkpoints.
2. A state and last committed transaction ID for a given transaction ID
and object ID.
3. A commit counter.
The checkpoints are two arbitrary committed transaction IDs. Clients
can use any pair of committed transaction IDs as checkpoints (so it's OK
if the checkpoints disappear from the cache), but the cache is much more
efficient if all clients use the same checkpoints.
Each storage object holds a pair of "delta" mappings, where each delta
contains {oid: tid}. The deltas contain information about what objects
have changed since the checkpoints: delta0 lists the changes since
checkpoint0 and delta1 lists the changes between checkpoint1 and
checkpoint0. Within each transaction, the delta0 mapping must be updated
before reading from the database.
When retrieving an object, the cache tries to discover the object's
current tid by looking first in delta0. If it's there, then the cache
asks memcache for the object state at that exact tid. If not, the cache
asks memcache for the object state and tid at the current checkpoints.
It is not actually necessary to have 2 checkpoints. It could work with
more checkpoints or only 1 checkpoint, but if there were only 1, each
checkpoint shift would be equivalent to flushing the cache. With more
checkpoints, the cache would often query many keys for each read
operation. 2 checkpoints seems like a good balance.
I wrote more notes about the caching strategy here:
http://svn.zope.org/relstorage/trunk/notes/caching.txt
As I review all of this, I wonder at the moment why I chose to create
delta1. It seems like the system would work without it. I probably
added it because I thought it would improve cache efficiency, but today
I'd rather simplify as much as possible even at the cost of a little
theoretical efficiency.
The commit counter is not very related, but since I brought it up, I'll
explain it briefly: it serves as a way for clients to discover whether
the database has changed without actually reading anything from the
database. It is a counter rather than a transaction ID because that
choice avoids a race condition.
Shane
More information about the ZODB-Dev
mailing list