[ZODB-Dev] what's the latest on zodb/zeo+memcached?

Thu Jan 17 18:13:16 UTC 2013

On Jan 17, 2013, at 10:31 AM, Claudiu Saftoiu <csaftoiu at gmail.com> wrote:

> What I don't understand is that this doesn't seem to work in the long run. Just before writing this email I ran a view that required a simple query after not having restarted the server in a while, and it took a minute or two to complete. Running the view again, it took only a few seconds. So it seems something had been moved out of the cache, which makes no sense to me as the server has plenty of RAM and the cache size is plenty large. 

You can use the 'cacheDetail' method of the ZODB to inspect your object ('connection cache') and see how many objects are in there. You are using a connection cache size of 500000, which means 500000 ZODB objects per connection/thread. 'cacheDetail' will help you see how many objects are being used towards that count of 500,000.

I did some recent investigations where I was looking at what happened as the result of a catalog query used on part of the home page on a customer site that is exhibiting similar behaviors. The query in question is for '10 most recent published weblog articles'. Here's looking at the cachedetail. You can get the 'db' object a number of ways depending on your framework. From any persistent object, you can get it via '._p_jar.db()'.

from pprint import pprint as pp
from operator import itemgetter
pp(sorted(db.cacheDetail(), key=itemgetter(1), reverse=True)[:20])
[('BTrees.IFBTree.IFSet', 79122),
 ('BTrees.IOBTree.IOBucket', 21516),
 ('BTrees.IFBTree.IFTreeSet', 3441),
 ('BTrees.OIBTree.OIBTree', 415),

Between IFSet and IOBucket, there's 100,000 objects alone that are going into our object/connection cache count (although another method, 'cacheSize()', says there are 83,758 items in the cache; I believe this is the non-ghost count). This is for just one query. 

So look at methods like cacheSize(), cacheDetailSize(), cacheDetail(), and if you're feeling adventurous: cacheExtremeDetail(). They will let you know how the object/connection cache is actually being used.

I think it's possible that with multiple, rather large BTree based catalog indexes that some of those IFSets and IOBuckets that make up their internals can still get flushed out if not exercised by a frequently used query. We've seen the same behavior on a couple of our biggest customers.

It's also quite possible that those big old catalog indexes have individual IFSets and Buckets that are getting invalidated since they change state as object data gets re-indexed. The invalidation causes the ZEO client cache to need to request a new copy, and I presume this invalidates data in the connection/object cache as well. Once that happens, IO is required to transfer the data over the network and/or disk into memory.

> Further, after having preloaded the indices once, shouldn't it preload quite rapidly upon further server restarts, if it's all in the cache and the cache is persisted?

Again, there are two caches here and they are not really related. The "persistent cache" is for ZEO to keep local copies instead of having to constantly hit the network. The object or 'connection' cache is what is in memory being used by the application. It still requires IO operations to find all of the bytes from the persistent ZEO cache and move them into memory as objects. The connection/object cache does not get preserved between restarts. The client/persistent cache is not a memory dump. If you run the ZODB with just a local FileStorage file, there is no 'persistent cache' aside from the database file itself.

Thanks,
Jeff Shell
jeff at bottlerocket.net

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.zope.org/pipermail/zodb-dev/attachments/20130117/e66bab73/attachment-0001.html>