[ZODB-Dev] Re: [Zope3-dev] Re: Community opinion about search+filter

Lennart Regebro regebro at gmail.com
Sun Mar 25 11:08:07 EDT 2007


On 3/25/07, Jim Fulton <jim at zope.com> wrote:
> On Mar 25, 2007, at 3:01 AM, Adam Groszer wrote:
> > MF> I think one of the main limitations of the current catalog (and
> > MF> hurry.query) is efficient support for sorting and batching the
> > query
> > MF> results. The Zope 3 catalog returns all matching results, which
> > can then
> > MF> be sorted and batched. This will stop being scalable for large
> > MF> collections. A relational database is able to do this
> > internally, and is
> > MF> potentially able to use optimizations there.
>
> What evidence to you have to support this assertion?  We did some
> literature search on this a few years ago and found no special trick
> to avoid sorting costs.
>
> I know of 2 approaches to reducing sort cost:
>
> 1. Sort your results based on the "primary key" and therefore, pick
> your primary key to match your sort results.  In terms of the Zope
> catalog framework, the primary keys are the document IDs, which are
> traditionally chosen randomly.  You can pick your primary keys based
> on a desired sort order instead. A variation on this theme is to use
> multiple sets of document ids,  storing multiple sets of ids in each
> index.  Of course, this approach doesn't help with something like
> relevance ranks.
>
> 2. Use an N-best algorithm.  If N is the size of the batch and M is
> the corpus size, then this is O(M*ln(N)) rather than O(M*ln(M)) which
> is a significant improvement if N << M, but still quite expensive.
>
> I don't think relational databases have any magic bullet to get
> around sorting costs.  Sorting is expensive.  In many ways, I think
> the sorting support in the catalog gave people a false sense of
> security.

I don't know if relational databases typically does this internally (I
don't think so). However, some search engines do it, like Lucene. And
supposedly also Dieters IncrementalSearch (haven't used it yet).

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64


More information about the ZODB-Dev mailing list