[ZODB-Dev] Indexing and dates/times

Tue Jul 13 10:19:50 EDT 2010

On Tue, Jul 13, 2010 at 7:51 AM, Pedro Ferreira
<jose.pedro.ferreira at cern.ch> wrote:
...
> Hmm... no, in fact the problem is that most of the time I will have only one
> value per index entry.So, in a relational DB i would do something like:

As I mentioned in some text you didn't quote, you could use a strategy
of storing a scalar unless there are dups.  If dups are very rare,
this *might* be a win.

...

> I'm sorry, I have used the wrong word. I meant that a
> range query will normally involve the union of a higher number of sets as
> the granularity gets smaller and smaller. If there is only one item per
> index entry, the union operation will take longer... I assumed that the more
> BTree entries we have, the more buckets we will have to fetch from the DB,
> for a given range query. But I am probably wrong...

The BTrees package has a pretty efficient strategy for merging
multiple sets.

>
>>> So, I was wondering if there is some data structure I can use for this,
>>> as
>>> this seems to be a pretty common use case.
>>>
>>
>> That's why the various indexing(/catalog) schemes already support it.
>>
>
> So, if I need an index where items can be queried by date, and range queries
> can be performed efficiently, an IOBTree will do the job?

An IOBTree could be the *basis* of a solution.

> As I mention
> above, my only concern is the number of sets that will have to be joined.

Depending on how rare dups are, I'd either use a BTree who's values
are scalers or sets. I might start with just using sets as the values.

Typically the strategy is to assign each object you want to index an
integer id.  Then your index is of the form:

  {key -> {docids}}

Then you can do a lookup with something like:

  result = BTrees.IOBTree.multiunion(index.values(min,max))

For more code examples, you should look at the zope.index and
zope.catalog packages.

Jim

--
Jim Fulton