[Zope3-Users] Re: getting random results out of a catalogs field index

Jürgen Kartnaller juergen at kartnaller.at
Sat May 5 15:52:02 EDT 2007

Dominique Lederer wrote:
> Christian Theune wrote:
>> Am Samstag, den 05.05.2007, 17:42 +0200 schrieb Dominique Lederer:
>>> hi
>>> i would like to retrieve a number of *random* entries out of a catalogs field index.
>>> i tried it with first getting the catalogindex-length an then accessing a
>>> randomized list-index, but this is very slow, because of the large number of
>>> entries in the index.
>>> do you know any better solution?
>> I'm kind of guessing here. 
>> You say you are:
>> - querying the catalog
>> - accessing a random index from the result set
>> - noticing that this is slow
>> Does this only happen if the index is very large, e.g. you're retrieving
>> an element from the end of the result set?
>> I don't know exactly how the result sets are organized, but this
>> behaviour would imply that loading a later element triggers something
>> like loading the earlier elements too. I can't really imagine that.
>> I think the general problem that this is slow lies in the fact that
>> randomly selecting elements means 
>> a) you need access to the full list of things
>> b) applying a sort 
>> Sorting has a complexity of at least O(n log n) which becomes slow
>> enough for large sets that it's noticable.
>> BTW: How large is large?
>> Christian
> hi, thanks for the reply, i just managed to improve the performance of my query
> significantly:
> what i wanted to do was:
> - retrieve the len() of the catalog index
> - retrieve a list() of the Resultset
> - accessing n random results and their objects
> to retrieve a random object i did:
> query = catalog.apply({'myIndex':(None,None)})
> length = len(query)
> index_intids = list(query)
> intid = all[random.randint(0,len_all-1)]
> object = getObject(intid)
> which was with 10000 items in the index slow (i had to wait 2-3 seconds for a
> view to render)
> after looking into the field index implementation i changed the above lines to:
> length = len(catalog['myIndex']._rev_index)

If you are using FieldIndex use

length = catalog['myIndex'].documentCount()

The FieldIndex holds a counter with the number of entries in the _rev_index.

> index_intids = list(catalog['myIndex']._rev_index.keys())
> which now works like a charm.
> i am not an expert with BTrees so i cant really say what the problem is/was.

len on a btree is slow because it needs to iterate over all keys to 
count them!

If possible always avoid using the catalog and use the index directly, 
it is much faster!


