[Zope-dev] ZCatalog sort_on considered harmful

Toby Dickenson tdickenson@geminidataloggers.com
Wed, 9 Feb 2000 14:39:53 -0000


I have observed suprisingly poor ZCatalog performance when using the sort_on
parameter in some specific (but fairly common) circumstances. The problem is
due to the loop begining line 507 in Catalog.py, which traverses the whole
index of the sort key.

It is very efficient if the result set contains items with duplicate sort
key values. For example sort_on='author', since many objects may have the
same 'Author' property.

However it is not efficient if the result set does not contain duplicate
sort values, and it can be very inefficient if the result set only contains
a small subset of the values in the whole sort key index (for example,
sort_on='create_date'). The change below switches between two different
schemes, based on the relative sizes of the index and result set.

Any comments?

                # sort.  If there are scores, then this block is not
                # reached, therefor 'sort-on' does not happen in the
                # context of text index query.  This should probably
                # sort by relevance first, then the 'sort-on' attribute.
                if len(rs)>len(sort_index._index):
                    # We have more hits than _unique_ values in the sort-key
index.
                    # This means that many of the hits have the same value
sort-key.
                    # We can use the index to efficiently partition all the
hits into
                    # those groups, then sort the groups.
                    for k, intset in sort_index._index.items():
                        if type(rs) is IIBType:
                            intset=rs.intersection(intset)
                        else:
                            intset=intset.intersection(rs)
                        if intset: 
                            append((k,LazyMap(self.__getitem__, intset)))
                else:
                    # We have more unique values in the sort-key index than
                    # we do hits. I think the indexes are of no help here.
                    for r in rs:
 
append(sort_index._unindex[r],LazyMap(self.__getitem__,[r]))


(also added to the collector)


Toby Dickenson