[Zope-dev] ZCatalog caching with memcached

Tres Seaver tseaver at palladion.com
Mon Oct 27 12:30:05 EDT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roché Compaan wrote:
> On Sun, 2008-10-26 at 14:07 -0400, Tres Seaver wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Roché Compaan wrote:
>>> On Sat, 2008-10-25 at 09:20 +0200, Hedley Roos wrote:
>>>>> Have you measures the time needs for some "standard" ZCatalog queries
>>>>> used with a Plone site with the communication overhead with memcached?
>>>>> Generally spoken: I think the ZCatalog is in general fast. Queries using a
>>>>> fulltext index are known to be more expensive or if you have to deal with
>>>>> large resultsets or complex queries.
>>>>>
>>>> No I haven't. Roche Compaan has done extensive benchmarking using
>>>> funkload testing plain catalog vs module level cache vs memcached, but
>>>> the tests are more about page serving than catalog query time. I'll
>>>> ask him to comment more on that.
>>> I actually did some profiling as well and catalog searches were just too
>>> damn slow. The average execution time for searchResults was 100
>>> milliseconds and this is why I told Hedley we should do some caching at
>>> query level in the first place. I experimented with this idea a couple
>>> of years back but wasn't successful due to inexperience. I was trying to
>>> cache brains which obviously leads to persistency bugs. This time around
>>> it was obvious to me that we should cache the IISet result sets.
>>>
>>> I suspect specific indexes are just performing suboptimally and needs to
>>> be improved. ExtendPathIndex in Plone seems to be one of them.
>>>
>>> The effect on performance is really awesome, now we just need to fine
>>> tune the implementation.
>> Before (or while) we work on caching, can we try to improve the
>> underlying indexes, and the way that applications use them?  I'm pretty
>> sure that there is a lot of room for improvement:
>>
>>  - Plone uses too many indexes, and in particular, uses multiple text
>>    indexes.  Having extra indexes around "just in case" is a sure lose
>>    a write time, and may even be expensive at query time (depending on
>>    the query).
>>
>>  - Particular indexes have performance characteristics based on their
>>    designed purpose:  for instance, the stock FieldIndex implementation
>>    assumes that the number of documents indexed will be >> the number of
>>    discrete indexable values.  Using such an index in an application
>>    domain with a very large set of indexable values probably loses, and
>>    in ways which don't show up in early / small-scale testing.
>>
>>  - I'm pretty sure that we haven't yet found the best data structure for
>>    "hierarchy indexes" (e.g., the Plone EPI index, or the stock Zope2
>>    PathIndex, etc.).  Something like a 'trie' might be optimal for
>>    pure prefix searching of hierarchies.
>>
>>  - I am confident that the TopicIndex is underutiliized:  it does *all*
>>    the work for a given query at write time, and can thus be blindingly
>>    fast at query time.
>>
>>  - Other special-purpose indexes (e.g., a "recent items" index) would
>>    be worth a look, especially for applications with large volumes of
>>    content.
> 
> I agree that one should look at improving performance without caching as
> well. But this is a lot harder and takes significantly more development
> and debugging time than introducing some form caching. So I'm not
> convinced that it needs to happen in a certain order. If caching gives
> you lots of performance with little effort now, then why shouldn't you
> use it?

Because it introduces complexities and error cases, perhaps?  I'm
particularly worried that introducing caching within the framework will
make it harder to deal properly with such cases at the application
level:  the awesomeness / usefulness of DWIM is a step function.



Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBeyN+gerLs4ltQ4RAoxvAKCaa0x9Q6wCfolSR98INb813g7mMACdGWx+
LF/M2LQgsE3A5tovfa8ywL8=
=PvxD
-----END PGP SIGNATURE-----


More information about the Zope-Dev mailing list