[ZODB-Dev] BTrees questions

Jeremy Hylton jeremy@zope.com
Mon, 17 Dec 2001 16:35:49 -0500 (EST)


>>>>> "MF" == Martijn Faassen <faassen@vet.uu.nl> writes:

  MF> Jeremy Hylton wrote:
  >> I think you should be using the BTree rather than the Bucket.
  >> The BTree creates buckets to store elements as needed, but I
  >> don't think you're supposed to create them yourself.

  MF> I figured it was something like that, but it didn't say
  MF> anywhere. Can I create sets myself, btw?

I don't know anything about sets :-(.

  MF> I'm trying to use the ZODB to store and index large quantities
  MF> of XML data. I'm especially interested in fast retrieval of data
  MF> and I need some form of join algorithms. Here I figure the BTree
  MF> module intersection etc operators can come in handy -- are they
  MF> supposed to be fast?

I think that's the intent.  I don't know if specific operations are
faster than others.

  MF> What happens if I keep a BTree in-memory anyway? I mean, if I
  MF> just create a BTree as a local variable, does the entire
  MF> structure exist in memory then? And how efficient is this? :)

A BTree is a persistent object and so are each of its buckets.  If you
have a large BTree and lookup a specific key, you'll only touch a very
small number of nodes or buckets.  So that's very memory efficient.

If you do a cache minimize or full sweep, individual buckets can get
deactivated.  So that, again, helps control memory usage.

If you can use one of the I BTrees (IO, OI, II), you'll save memory
because the int part is stored directly in the BTree instead of using
a PyObject * to a Python int object.

  >> I've heard that there are docs somewhere, but I'm not sure where.

  MF> There's Interfaces.py, but that's all I could find so far.

Ah, yes.  I think that's it.

Jeremy