[ZODB-Dev] IIBTree.multiunion and list comprehensions

Tim Peters tim at zope.com
Tue Dec 9 19:40:48 EST 2003


[Christian Robottom Reis]
> What size of BTree would be necessary for the difference to actually
> be noticeable?

It's not simple.  In general, the "multi" part of multiunion is more
important than the "union" part, and timings depend on the number of
distinct collections getting united, and the precise distribution of values
within each.  There is, for example, no advantage to calling multiunion
instead of union with only two collections, regardless of their sizes.

multiunion() always takes time proportional to the sum of the number of
elements in its inputs, *but* the constant of proportionality is relatively
high (it makes 6 passes over the data), and so an N log N or even N**2
method may be faster on particular inputs.

> My application has BTrees with tens of thousands of objects, but
> nothing above a hundred thousand objects. It's been quite fast with
> the exception of when we union large numbers of sets, which is slow
> on the first run (slowness which seems unavoidable given that the
> operation potentially touches a large number of buckets in the tree).

union'ing a large number of sets is what multiunion is best at, but nothing
can hide that large structures of any kind stress your box's memory caches,
or even your disk.

> I'm curious because I don't think I've ever heard of *very* large
> BTrees being discussed on the list before.

I still don't think I have <wink>.  multiunion isn't aiming at large
collections, it's aiming at 3 or more collections.

> I assume you (and other Zopers) have worked with some pretty gigantic
> ones; if you could throw up some numbers, I'm guessing it would do
> wonders for my blood pressure <wink>.

I've only seen gigantic BTrees in synthetic stress tests (but that I *have*
seen them there should give you some confidence that they worked <wink>).
Whenever Jeremy has done a dump of an .fs file from the field, we've usually
found that most BTrees in it are tiny (no more than a few dozen entries).  I
think BTrees with as many as 100,000 elements are very rare.  I've run
stress tests with 100 million element trees, and those kinds of tests end
when my disk starts smoking <0.7 wink>.




More information about the ZODB-Dev mailing list