Hi Gary,<div>Thanks for your comprehensive answer. Yes, my extents aren't really as</div><div>small as in my examples. Seems like a reasonable idea to wait with</div><div>optimizations, not sure they are even needed, at least not within a year
</div><div>or so :)</div><div><br class="webkit-block-placeholder"></div><div>Cheers<br><br><div><span class="gmail_quote">On 10/28/07, <b class="gmail_sendername">Gary Poster</b> <<a href="mailto:gary@zope.com">gary@zope.com
</a>> wrote:</span><blockquote class="gmail_quote" style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jesper.<br><br>Extents have a primary use case in the zc.catalog package of defining<br>
the extent of a catalog--a set of indexes. This is more efficient<br>both in terms of programmer time and computer time than filtering out<br>objects per-index. It also allows asking indexes questions that would<br>otherwise be impossible,
e.g., "what objects do *not* match this<br>particular search?", and a couple of others.<br><br>I'm not sure hurry.query leverages all aspects of extents, and indexes<br>that know how to deal with them. I seem to recall that it didn't, but
<br>I could have been wrong and it was a while ago.<br><br>So, the primary use case is different than yours.<br><br>Extents can be used in the way that you describe--intersecting against<br>a larger search of a larger catalog. What you described is a
<br>reasonable first cut, and a reasonable use of extents.<br><br>Depending on your use cases and the time available, you may want to<br>explore optimizations. I wouldn't surprised if you eventually wanted<br>to roll your own catalog to do the set operations in the ways that
<br>make the most sense for your application. A few quick thoughts:<br><br>- If your common extents are really as small as in your examples, one<br>thing to realize is that the time for an intersection in BTree code<br>pretty much always is determined by the size of the smaller set.
<br>Therefore, given three sets that need to be intersected (say, your<br>extent and the result of the search of two indexes) of relative sizes<br>Small, Medium, and Large, you want to intersect in this way:<br>intersect(intersect(Small, Medium), Large). See
<a href="http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto">http://svn.zope.org/zc.relation/trunk/src/zc/relation/timeit/manual_intersection.py?view=auto</a><br> for timeit fun, if you like.
<br><br>- there are two primary costs of a big catalog, IMO/IME: write time<br>and load time. If necessary for your app, consider ways to try to<br>keep smaller catalogs (e.g., does the value of some information<br>diminish over time? Does it make sense to have separate catalogs,
<br>divided across some boundary or boundaries?); and consider ways to<br>keep the catalog in memory (in the object cache).<br><br>- if you typically only need the first X of a result set, doing<br>something like Dieter Maurer's incremental search Zope 2 code would be
<br>interesting to research and might be appreciated by the community if<br>it worked out well.<br><br>Finally IMO/IME, only pursue these sometimes risky optimizations if<br>they are really necessary and if you have some pretty concrete
<br>research or knowledge (your own or others) to back up your plan. If I<br>were you I'd just start out with the "do a search and then intersect<br>with the extent" approach you mentioned, and only worry about it more
<br>when your app needs it.<br><br>HTH<br><br>Gary<br></blockquote></div><br> </div>