[ZODB-Dev] Basic ZODB practices

Tue Mar 9 23:09:14 EST 2004

[Martijn Faassen]
> ...
> You'd think random ids could be *bad* for read performance though. If
> there is some sequentiality in the data you're putting in a BTree, and
> you're reading this data in its 'native' sequencing, with random ids
> you'd be waking up a lot of buckets, which hurts read performance.
> Better just wake up a single bucket with the sequential data in there,
> and then read the values. What're people's thoughts on this?

There's no substitute for measurement <0.5 wink>.

> Perhaps related, I recall Tim Peters (I think, might've been someone
> else) once (long ago..) saying random ids don't really help, though I
> forget the exact context. Perhaps Tim remembers and can jump in, but I
> may remember this completely wrong anyway. :)

Google on

    site:mail.zope.org zodb-dev unique id

to find the last long thread about this.  Every app seems to have its own
special needs, but Casey's advice to study Zope Catalog is as plausible as
any I've seen:

    http://mail.zope.org/pipermail/zodb-dev/2003-June/005267.html

That mixes sequential and random, using randomization to pick a starting
point for passing out sequential ids.  If temporal clustering of id creation
is a predictor of future temporal clustering of access to the associated
objects, then it should be a big win to keep ids sequential (in, e.g., an
IOBTree keyed by id).  OTOH, if future object access patterns are unrelated
to temporal clustering of id creation, there seems no reason to avoid purely
random ids.

FileStorage oids are deliberately not random, BTW (and the usually invisible
fsBTree type exploits that heavily to save space -- tens of thousands of
8-byte oids typically share a single 6-byte oid prefix in an fsBTree, and
that's possible only because FS oids are passed out sequentially).