[ZODB-Dev] [fsIndex] surprizing documentation -- inefficiency?

Mon Jan 21 11:35:43 EST 2008

On Jan 21, 2008, at 6:09 AM, Dieter Maurer wrote:

> "ZODB.fsIndex" tells us in its source code documentation that it  
> splits
> the 8 byte oid into a 6 byte prefix and a two byte suffix and
> represents the index by an "OOBTree(prefix -> fsBucket(suffix ->  
> position))"
>
> It explains that is uses "fsBucket" (instead of a full tree) because
> the "suffix -> position" would contain at most 256 entries.
>
> This explanation surprises me a bit: why should the bucket contain
> only 256 rather than 256 * 256 (= 64.000) entries?

You are right. The documentation is incorrect. I just fixed it on the  
trunk.

> If the assumption is wrong (i.e. the "fsBucket" can contain up to
> 64.000 entries), is the implementation inefficient (because of that)?

I don't think so.  This is an in-memory data structure, so there isn't  
a problem with a big data structure repeatedly being written to the  
file.  Also, because new keys are normally computed sequentially, they  
will be added at the end of the bucket, so we won't incur the cost of  
shifting lots of data around.

It's possible that this is more expensive than one might want during  
packing, when inserts are likely to be a bit more random.  It might be  
interesting to do a memory and cpu comparison using buckets and trees.

Jim

--
Jim Fulton
Zope Corporation