[Zope3-Users] zc.catalog's FilterExtent (with hurry.query)

Gary Poster gary at zope.com
Sun Oct 28 15:14:10 EDT 2007


On Oct 27, 2007, at 9:40 AM, Jesper Petersen wrote:

> Hey!
> I'm trying to understand if my idea of how to use the FilterExtent  
> in zc.catalog (1.1.1) is
> correct (and efficient). I'm also using hurry.query (0.9.3). My  
> current understanding of
> extents is: they can be used to perform a search on a subset of a  
> catalog. For example,
> "give me all objects where attr1 is 'foo' but only for intids 5,6,7  
> and 10"
>
>
> Short version:
> I have an extent of a large catalog. How do I make a search within  
> this extent?

Hi Jesper.

Extents have a primary use case in the zc.catalog package of defining  
the extent of a catalog--a set of indexes.  This is more efficient  
both in terms of programmer time and computer time than filtering out  
objects per-index.  It also allows asking indexes questions that would  
otherwise be impossible, e.g., "what objects do *not* match this  
particular search?", and a couple of others.

I'm not sure hurry.query leverages all aspects of extents, and indexes  
that know how to deal with them.  I seem to recall that it didn't, but  
I could have been wrong and it was a while ago.

So, the primary use case is different than yours.

Extents can be used in the way that you describe--intersecting against  
a larger search of a larger catalog.  What you described is a  
reasonable first cut, and a reasonable use of extents.

Depending on your use cases and the time available, you may want to  
explore optimizations.  I wouldn't surprised if you eventually wanted  
to roll your own catalog to do the set operations in the ways that  
make the most sense for your application.  A few quick thoughts:

- If your common extents are really as small as in your examples, one  
thing to realize is that the time for an intersection in BTree code  
pretty much always is determined by the size of the smaller set.   
Therefore, given three sets that need to be intersected (say, your  
extent and the result of the search of two indexes) of relative sizes  
Small, Medium, and Large, you want to intersect in this way:  
intersect(intersect(Small, Medium), Large).  See http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto 
  for timeit fun, if you like.

- there are two primary costs of a big catalog, IMO/IME: write time  
and load time.  If necessary for your app, consider ways to try to  
keep smaller catalogs (e.g., does the value of some information  
diminish over time?  Does it make sense to have separate catalogs,  
divided across some boundary or boundaries?); and consider ways to  
keep the catalog in memory (in the object cache).

- if you typically only need the first X of a result set, doing  
something like Dieter Maurer's incremental search Zope 2 code would be  
interesting to research and might be appreciated by the community if  
it worked out well.

Finally IMO/IME, only pursue these sometimes risky optimizations if  
they are really necessary and if you have some pretty concrete  
research or knowledge (your own or others) to back up your plan.  If I  
were you I'd just start out with the "do a search and then intersect  
with the extent" approach you mentioned, and only worry about it more  
when your app needs it.

HTH

Gary


More information about the Zope3-users mailing list