[ZODB-Dev] [BTrees] considerable optimization potential in"intersection"

Wed Aug 4 23:17:16 EDT 2004

[Dieter Maurer]
> we observed that "intersection(IISet((i,)), IOBTreeInstance)" accesses
> much more data in the "IOBTreeInstance" (and therefore is *MUCH* slower
> (by several orders when the tree has to be loaded from external storage))
> then "IOBTreeInstance.has_key(i)".

I believe that.  As per the last message, it takes time proportional to the
number of keys in the IOBTreeInstance <= i.  If i is very small, it's fast.
If i >= max(IOBTreeInstance), it's maximally slow, and loads the entire
IOBTreeInstance.

> As there is no theoretical reason why "intersection" should access more
> data then the "has_key" test, there is obviously unused optimization
> potential.

Right.  See the paper I referenced last time for mounds of untapped
optimization potential.

> Would you mind when I tried to optimize "intersection"?

I think it would be great if you wrote it in Python.  The kinds of
algorithms in the referenced paper do tons of low-level fiddling, so get a
real advantage from being coded in C.  The only optimization I've seen
talked about *here* is "intersecting two containers of vastly different
sizes", common to your and Andreas's posts.  You can get orders-of-magnitude
speedup by writing that in Python (as Andreas reported for his Python loop).

Writing this specific optimization in C instead would buy little extra
performance.  No matter which language it's written in, it will spend most
time doing searches, and Python "key in btree" (3.3; btree.has_key(key) in
3.2) triggers the same C search code as the long-winded way to do a search
in C.  Writing in C would be several thousands times more likely to
introduce a subtle bug.  In contrast, it's hard to write a buggy Python
function for this simple optimization, and it's much easier for people to
tune and experiment then.

And tuning is necessary, because at some point it's going to be faster to
use the builtin intersection.  If the containers have sizes M and N, the
"iteratively search for keys" method always takes O(M * log N) time, while
the builtin method takes worst-case O(M+N) time.  The latter eventually
dominates, no matter how cleverly the former is coded.

BTW, the algorithms in the paper *can* be very fast even when M and N are
both very large (they adapt to the data in the containers, not just to the
container sizes), but require multiple clever low-level search strategies.