[Zope] Intersection/Union of ZCatalog result sets

Johan Carlsson johanc at easypublisher.com
Fri Sep 24 08:43:16 EDT 2004


Jonathan Hobbs wrote:

>I am doing this to try to squeeze out some performance improvements from a
>ZCTextIndex. We have a zcatalog with about 1 million documents that we are
>full-text indexing and it no longer fits into memory (therefore requiring
>many disk i/o's during retrieval which is seriously degrading performance).
>
>Our zcatalog currently has 5 indexes: 4 minor indexes and one major index
>(the main ZCTextIndex).  I am attempting to split the zcatalog into two
>separate zcatalogs: one containing the 4 minor indexes and one containing
>the ZCTextIndex.  The hope is that the zcatalog containing only the
>ZCTextIndex will be smaller and will again fit into memory.
>  
>
Why would it be smaller?
You still need to load the indexes when you do a search, right?
Or do you intend to index different objects in different catalogs?
In that case couldn't you use the idxs attribute
of ZCatalog::catalog_object(self, obj, uid=None, idxs=None, 
update_metadata=1)?

>The only difficulty is in combining the results from searches of two
>separate zcatalogs in an efficient manner.  My best guess at this point is
>that I will have to patch the 'search' routine in ZCTextIndex to stop it
>from 'Lazifying' the result sets, so that I can join/intersect the result
>sets based on OIDs (instead of RIDs - which should be doable as the result
>sets prior to 'lazifying' are xxBTrees and the BTrees product comes with
>methods for join/intersection). I can then 'Lazify' the final result set and
>return it.  At least that's the theory!
>  
>
Maybe do a version of ZCatalog (or rather Catalog) that uses OIDs as RIDs?
Only problem is that OIDs are int64 and BTrees.IISet et al. uses int32.
So you would need a IISet that take long.






More information about the Zope mailing list