ZCatalog enhancement wishes (was: Re: [Zope] Re: Zope Digest, Vol 6, Issue 37)

Dieter Maurer dieter at handshake.de
Sat Jan 24 05:18:22 EST 2004


Casey Duncan wrote at 2004-1-23 18:12 -0500:
> ...
>I'd be interested to know what the specific reasons are. I have plans
>about improving ZCatalog in various ways, and it's always interesting to
>here other outside opinions and use cases that can be used to inform
>future improvements.

We had extremely bulky BTrees buckets holding the meta data information.
This caused huge transaction sizes (a workflow state change resulted
in a transaction of about 500 kB).
Of course, this was a configuration problem: "summary" and
"bobobase_modification_time" were part of the catalog's MetaData
and my colleagues used "summary" extensively (each summary was
several kb big) ...


Tim already optimized the BTrees package a lot. But, intersection
may still gain from more optimizations. I used code like this:

    found = intersect(tree, set)

where "tree" is an "OOBTree" and "set" usually had a single element
(but could have more, of course).
I found out, that this is often extremely slow -- much much slower
than

    if len(set) == 1:
      key = set[0]
      if tree.has_key(key): found = set
      else found = OOSet()
    else:
      found = intersct(tree, set)

In a fully optimized intersection, the difference should be very small.


Path index searches are slow. It helped (for us) to reverse
the order in which intersections are done (lower level path
components tend to be more specific, leading to smaller intermediate
intersection sets).


Colleagues suggested to cache catalog results. I will implement that
soon (however not for "ZCatalog" itself but for our
"HaufeQuery" which is similar to your "CatalogQuery", just using
query objects instead of query strings).


"ZCatalog" should have an easy way to freely use "and", "or" "not"
to combine subqueries to indexes -- similar to your "CatalogQuery"
(or our "HaufeQuery").

-- 
Dieter



More information about the Zope mailing list