[ZODB-Dev] what's the latest on zodb/zeo+memcached?

Claudiu Saftoiu csaftoiu at gmail.com
Thu Jan 17 17:31:52 UTC 2013


>
> > Okay, that makes sense. Would that be a server-side cache, or a
> client-side
> > cache?
>
> There are no server-side caches (other than the OS disk cache).
>

Ok, that's what I gathered before, was just checking.

> I believe I've already succeeded in getting a client-side persistent
> > disk-based cache to work (my zodb_indexdb_uri is
> >
> "zeo://%(here)s/zeo_indexdb.sock?cache_size=2000MB&connection_cache_size=500000&connection_pool_size=5&var=zeocache&client=index"),
>
> This configuration syntax isn't part of ZODB.  I'm not familiar with
> the options there.


Ah yes it's a part of repoze -
http://docs.repoze.org/zodbconn/narr.html#zeo-uri-scheme . I looked into
this, and the following mappings from uri-syntax to xml-syntax hold true:

  cache_size --> zodb/zeoclient/cache-size
  connection_cache_size --> zodb/cache-size
  connection_pool_size --> zodb/pool-size
  var --> zodb/zeoclient/var
  client --> zodb/zeoclient/client


> > but this doesn't seem to be what you're referring to as that is exactly
> the
> > same size as the in-memory cache.
>
> I doubt it, but who knows?
>

I meant that I have only one cache-size option in terms of bytes, and the
cache made on the disk is exactly that size (rather, it reserves all the
space on disk instantly, even if it isn't all used).

----------------------

Here's a detailed description of the issues I'm having.

I wrote the following code to preload the indices:

    def preload_index_btree(index_name, index_type, btree):
        print "((Preloading '%s' %s index btree...))" % (index_name,
index_type)
        start = last_print = time.time()
        for i, item in enumerate(btree.items()):
            item
        print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % (
            index_name, index_type, i, time.time() - start,
        )
    def preload_catalog(catalog):
        """Given a catalog, touch every persistent object we can find to
force
        them to go into the cache."""
        start = time.time()
        num_indices = len(catalog.items())
        for i, (index_name, index) in enumerate(catalog.items()):
            print "((Preloading index %2d/%2d '%s'...))" % (i+1,
num_indices, index_name,)
            preload_index_btree(index_name, 'fwd', index._fwd_index)
            preload_index_btree(index_name, 'rev', index._rev_index)
        print "((Preloaded catalog! Took %.2fs))" % (time.time() - start)

And I run it on server start as follows (modified for the relevant parts; I
tried to make the example simple but it ended up needing a lot of parts).
This runs in a thread:

    from util import zodb as Z
    from util import zodb_query as ZQ
    for i in xrange(3):
        connwrap = Z.ConnWrapper('index')
        print "((Preload #%d...))" % (i+1)
        with connwrap as index_root:
            ZQ.preload_catalog(index_root.index.catalog)
        connwrap.close()

Z.ConnWrapper is something that uses my config to return connections such
that I only have one DB instance for the whole server process:

  class ConnWrapper(object):
    def __init__(self, db_name):
        global_config = appconfig.get_config()
        db_conf = global_config['dbs'][db_name]

        db = db_conf['db']
        self.appmaker = db_conf['appmaker']

        conn = db.open()
        self.conn = conn
        self.cur_t = None
    #...
    def get_approot(self):
        return self.appmaker(self.conn.root())
    def __enter__(self):
        """.begin() transaction and return the app_root"""
        if self.cur_t:
            raise ValueError("transaction already in progres")
        self.cur_t = self.conn.transaction_manager.begin()
        return self.get_approot()
    def __exit__(self, typ, value, tb):
        if typ is None:
            try:
                self.cur_t.commit()
            except:
                self.cur_t = None
                raise

            self.cur_t = None
        else:
            self.cur_t.abort()
            self.cur_t = None

The relevant part of the global config setup is:

    from repoze.zodbconn.uri import db_from_uri
    from indexdb.models import appmaker as indexdb_appmaker
    #...
    zodb_indexdb_uri = global_config.get('zodb_indexdb_uri')
    index_db = db_from_uri(zodb_indexdb_uri)
    global_config['dbs'] = {
        'index': {
            'db': index_db,
            'appmaker': indexdb_appmaker,
        },
    }

`zodb_indexdb_uri` is in my .ini file as mentioned above:

    zodb_indexdb_uri =
zeo://%(here)s/zeo_indexdb.sock?cache_size=3000MB&connection_cache_size=5000000&connection_pool_size=5&var=zeocache&client=index

The preloading seems to accomplish its purpose. When I restart the server,
it takes a while to run through all the indices the first time over and the
memory usage grows as this is happening, e.g.:

    ((Preloading index  3/17 'account'...))
    ((Preloading 'account' fwd index btree...))
    ((Preloaded 'account' fwd index btree (37 items in 0.00s)))
    ((Preloading 'account' rev index btree...))
    ((Preloaded 'account' rev index btree (346786 items in 69.72s)))

And the subsequent attempts are quite rapid:

    ((Preloading index  3/17 'account'...))
    ((Preloading 'account' fwd index btree...))
    ((Preloaded 'account' fwd index btree (37 items in 0.00s)))
    ((Preloading 'account' rev index btree...))
    ((Preloaded 'account' rev index btree (346903 items in 0.08s)))
    ...
    ((Preloaded catalog! Took 1.58s)) #(for the entire catalog)

What I don't understand is that this doesn't seem to work in the long run.
Just before writing this email I ran a view that required a simple query
after not having restarted the server in a while, and it took a minute or
two to complete. Running the view again, it took only a few seconds. So it
seems something had been moved out of the cache, which makes no sense to me
as the server has plenty of RAM and the cache size is plenty large.

Further, after having preloaded the indices once, shouldn't it preload
quite rapidly upon further server restarts, if it's all in the cache and
the cache is persisted? After the above preload ran, I restarted the
server, and although the first few indices did indeed load more quickly:

    ((Preloading index  3/17 'account'...))
    ((Preloading 'account' fwd index btree...))
    ((Preloaded 'account' fwd index btree (37 items in 0.00s)))
    ((Preloading 'account' rev index btree...))
    ((Preloaded 'account' rev index btree (346905 items in 8.37s)))
Some took just as long:
    ((Preloaded 'timestamp' fwd index btree (348333 items in 90.69s)))
And the whole catalog took a good amount yet:
    (Preloaded catalog! Took 199.03s))
Granted, if it took 8-11 seconds per index instead of 60-90 seconds per
index, that's not such a bad improvement. This is why I was considering
memcachedb, by the way - something that would work well in between restarts
of my server. Another option I guess would be to have a whole other server
instance running just for the caching that would be always on to avoid
these issues.

It seems like something isn't working with the caching, but I can't figure
out why... any pointers? It seems I don't understand the cache mechanism
very well as I did everything I thought should work according to my
understanding.

One potential thing is this: after a zeopack the index database .fs file is
about 400 megabytes, so I figure a cache of 3000 megabytes should more than
cover it. Before a zeopack, though - I do one every 3 hours - the file
grows to 7.6 gigabytes. Shouldn't the relevant objects - the entire set of
latest versions of the objects - be the ones in the cache, thus it doesn't
matter that the .fs file is 7.6gb as the actual used bits of it are only
400mb or so? Another question is, does zeopacking destroy the cache? If so
then that would make sense. I'll have to preload upon every zeopack. If
it's not that, then I'm not sure what it could be.

Thanks,
- Claudiu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.zope.org/pipermail/zodb-dev/attachments/20130117/f1d54554/attachment.html>


More information about the ZODB-Dev mailing list