[ZODB-Dev] PersistentMapping good for large numbers of objects?

Thu Jul 17 11:52:17 EDT 2003

On Thursday 17 July 2003 08:36 am, Stefan H. Holek wrote:
> With regard to this notice in BTrees.Interfaces:
> 
> ###############################################################
> # IMPORTANT NOTE
> #
> # Getting the length of a BTree, TreeSet, or output of keys,
> # values, or items of same is expensive. If you need to get the
> # length, you need to maintain this separately.
> #
> # Eventually, I need to express this through the interfaces.
> #
> ################################################################
> 
> How slow would mybtree.keys() be in case of 20,000 objects? Slower than 
> with a plain dict or PersistentMapping? Should a separate list be kept, 
> probably, along the lines of BTrees.Length?

keys() for a btree returns a lazy iterator. In theory the size of the BTree 
should not affect the time taken by keys(). The time to iterate all the keys 
would of course increase with a larger BTree since each iteration must lookup 
the next key in sequence. Tim could probably tell you what the exact 
algorithmic complexity of key lookups is <wink>.

This is why len(BTree) is slow for large trees. In order to compute the len(), 
the BTree iterates and counts all of the keys. Since this often means loading 
a raft of objects from the disk (often more then the ZODB cache holds), it 
can be very expensive indeed.

Since not all applications need to keep track of the size of a BTree, the 
Length object is separate so that applications no needing to know the length 
do not incur the bookkeeping and conflict-resolution expense. So keeping a 
Length trades a bit of expense at write time for a big savings a read time. 
Often a worthy trade.

-Casey