[ZODB-Dev] BTrees and PersistentDict keys: bug or feature

Christian Reis kiko at async.com.br
Fri Aug 15 00:39:46 EDT 2003


On Thu, Aug 14, 2003 at 10:00:03PM -0400, Tim Peters wrote:
> The docs I pointed you at before strongly recommend using simple, immutable
> values, with well-defined __cmp__ methods, as BTree keys.  When you go
> beyond that, you're trying things nobody understands.  For example, I have
> no idea how a PersistentMapping will or won't compare itself against a
> Python dict, and I wager nobody else does either.

To explain a tiny bit further, I'm using an OOBTree that maps
dictionaries to objects. This is part of the indexing mechanism in
IndexedCatalog, which is a simple indexing and query mechanism for the
ZODB. This allows us to do queries like:

    "5 in mydict"

and so on as fast as BTree lookups can happen. It's a minor feature, but
it's used in a couple of places in a couple of projects, and ever since
I decided to clean up the handling of mutable attributes, I've been at
war against corrupted BTrees. I've now resorted to writing simple
wrappers around PersistentList and PersistentMapping that emit update
signals every time they are mutated (which trigger unindexing and
indexing of the values). I ran into this bug while I was debugging an
odd problem with keys of mixed types ending up in the BTree, and I
thought I would share it with the mailing list.

I've fixed this to make sure I only store dictionaries as keys. My
wrapper class makes sure that the dicts are never mutated while being
used as being keys, and so far check() has been reassuring.

> > [various examples of trying to use Python dicts and PersistentMapping
> >  objects as BTree keys, and of trying to use one of those types as
> >  a key when setting a value, then retrieving via the other type]
> > ...
> 
> You don't need BTrees to find confusion here.  For example, this is enough:
> 
> >>> d = {1:1, 2:2}
> >>> p = PersistentMapping(d)
> 
> >>> p == d
> 1
> >>> d == p
> 0
> >>>
> 
> I expect you'll see the same two results, and (of course) they're
> inconsistent.  (They're also inconsistent under the covers, then, when the
> BTree code invokes comparison operations.)

Yep, that's what I imagined, even without coming to the conclusion that
simple comparisons would work that way too.

> > The check() module indicates it works okay.
> 
> check.check() can only confirm that the BTree hasn't become internally
> insane.

I phrased this wrongly (was temporarily distracted); I meant precisely
that.

> Since mixed-type comparisons here are demonstrably inconsistent, you would
> have to rely on brand new releases of ZODB and/or of Python too(!) to
> straighten those out.  You would also have to convince someone it's worth
> their time to try to repair the inconsistencies in ZODB and/or in Python.
> That would be an even harder sell than you're imagining, because the
> implementation of comparisons right now is so bloody complicated that
> there's a very strong chance any change will break someone else's
> currently-working code.

Well, my message was actually meant in two ways: as a notice that
something was potentially wrong, and as a way to register this for
posterity (so that other people googling find this easily and don't
spend time chasing ghosts). I agree it's a corner case, of course, but
that doesn't prevent it from bugging me tonight.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL



More information about the ZODB-Dev mailing list