[ZODB-Dev] odd KeyError in OOSets

Tim Peters tim_one at email.msn.com
Fri Apr 18 20:41:42 EDT 2003


[Tim]
>> But the output confirms that you can't use instances of this class in a
>> persistent OOSet (see earlier mail) anyway:  the pattern of cmp results
>> confirms that the OOSet's elements aren't in sorted order
>> anymore (object > node0, and object < node6, but object doesn't lie
>> between node0 and node6).  That's typical when persistent objects get
>> new memory addresses upon loading them from a database and the default
>> __cmp__ is used to compare the instances.

[Nicholas Henke]
> That is a fun 'gotcha'. BTW ClubmaskObject(IndexedObject) and
> IndexedObject(Persistant), so either Persistant is supposed to provide
> __cmp__, or this _really_ needs to be documented somewhere, as any class
> that derives from Persistant needs to define __cmp__.

There's no requirement that persistent objects define __cmp__, the
requirement is that objects used specifically as members of persistent Sets
or TreeSets, or as keys in persistent Buckets or BTrees, provide a
consistent total ordering.  Objects like integers and strings already do.
Python classes don't, and really can't make up an ordering for you (that
requires knowledge about the *internals* of your object, just like string
comparison needs to know how to pick apart a string into characters in order
to compare strings consistently independent of accidents like memory
address; Python can't guess a similarly good comparison algorithm for
arbitrary objects it knows nothing about, so you have to write a __cmp__
that's good enough).

I'm tempted to agree that the docs are lacking here, but don't really know
where I'd find any docs to begin with <0.9 wink>.

> I am confused why cmp returns 0 above ( shows a match ),

Because the object addresses were identical.

> but the OOSet does not like it.  Is it complaining because it can't find
> the object, or what I am assuming what is happening, where the objects
> are out of order on restore, and that makes OOSet really unhappy?

Sets store objects in sorted order (sorted according to __cmp__), and use
binary search for lookups.  If the objects aren't actually in sorted order,
binary search may or may not find any particular object you're looking for.
For example, if we do a binary search for 5 in

   100. -6, 5, 1, -20

we'll find it, because 5 is the first object looked at (it's the object in
exactly at the middle, and the middle is the first place binary search
looks).  If we search for 100, we won't find it, because binary search will
say "OK, 100 is bigger than 5, so from now on I only have to look at the
last two elements", and 100 isn't there.  If we do a binary search for -6,
we may or may not find it, depending on the exact details of what the binary
search does after deciding -6 must be "to the left" of 5 (if it looks at 100
next, it won't find -6).

Of course binary search works great when the sequence really is in sorted
order, and that's why a consistent __cmp__ is crucial.  So that's the good
news:  if you can define a consistent __cmp__ for your objects, they'll work
fine in OOSets.




More information about the ZODB-Dev mailing list