[ZODB-Dev] Indexing and dates/times

Jim Fulton jim at zope.com
Tue Jul 13 05:44:37 EDT 2010


On Tue, Jul 13, 2010 at 4:35 AM, Pedro Ferreira
<jose.pedro.ferreira at cern.ch> wrote:
> Hello,
>>>
>>> I am currently trying to devise a way to index and retrieve some
>>> millions of objects according to their modification date/time. One of
>>> the problems I'm facing is that of index "granularity": I'd like to
>>> provide "to the second" granularity,
>>>
>>
>> will there ever be more than item with the same key?
>>
>
> Exactly, that's the problem.

Typically, to model something like this, you's have a BTree who's
values are sets.  If single items are common and you were willing to
work a bit harder, you could have BTrees whos values could be either a
set or a scalar.

>>> but for that I need some structure
>>> that lets me do that. So, the options I see are:
>>>  - A timestamp-based
>>>
>>
>> What do you mean by "timestamp"
>>
>
> Well, it could be a UNIX timestamp.

It could be lots of things. I was asking what you meant.

If you used a unix time stamp, you could ise one of the Ix flavors of BTree.


>>>
>>> BTree index - looks highly inefficient, as there
>>> will be many entries with only one element (probably almost all of
>>> them),
>>>
>>
>> I have no idea what you mean by this.
>>
>
> That's the problem you've already mentioned above.

So, the issue is that you have multiple items with the same
key. This is simply handled by using sets as values ion a BTree.
There are existing index implementations that do this.

>
> So, in a relational DB i would do something like:
>
> SELECT * FROM table WHERE timestamp >= X AND timestamp <= Y
>
> Since I cannot do this with ZODB,

I don't know what "this" is. Range seaches? SQL? BTrees and various
index implementations based on the,m support range searches.  of
course, ZODB doesn't support SQL.

> I'd have to have a BTree, indexed by
> timestamp... however, as you said, if I want "to the second" granularity, I
> will rarely have two items with the same key (which makes it pretty
> useless).

I don't know why it is useless, but it is easily handled.

> So, I was wondering if there is some data structure I can use for this, as
> this seems to be a pretty common use case.

That's why the various indexing(/catalog) schemes already support it.

Jim

--
Jim Fulton


More information about the ZODB-Dev mailing list