[ZODB-Dev] OOBTree: Persistent Objects as keys

Thu, 07 Feb 2002 07:18:39 -0500

Toby Dickenson wrote:
> 
> On Wednesday 06 February 2002 12:27 pm, you wrote:
> 
> >The problem occurs when Python falls back to comparing addresses.
> >If we could raise an error when this happens, we could avoid building
> >meaningless BTrees. (Note that a similar problem occurs when hashes
> >based
> >on addresses of persistenct objects are uses in has tables.)
> 
> I think its bigger than that. Using __cmp__ comparison also has problems when
> mixing unicode string and 8-bit-wide strings.

Let's distinguish two cases:

1. Comparison raises an error, preventing a key from being used.

2. Comparison is inconsistent.

I consider case 1 to be a totally different class of problem than
case 2. I find case 1 to be perfectly acceptable. I've said it before, 
but I'll say it again. Keys in BTrees are ordered. If that doesn't work
for you, pick another data structure. ;)

So, wrt strings and unicode, I thionk that case 1 applies as long as you don't
change the default encoding.

> Even worse, Python reserves the right to change the order of the default
> cross-type comparison. If I remember correctly, the relative sort order of
> None and integers changed in Python 2.2. I wonder how many databases out
> there contain this mix of keys, and will break after upgrading to 2.2.

This arises from the fact that Python falls back to address comparison when
it can't find an application-defined (IOW meaningful) comparison. 
Comparing by address is acceptable for some applications, but is totally
inappropriate for persistent data.

> >Hm, maybe BTrees should use their own comparison algorithm
> >that never compares addresses. This would be a bit of a pain, since
> >given the complexity of Python's internal comparison logic. Perhaps
> >the Python logic could just be copied and the bit that falls back to
> >address comparison could be hacked out.
> 
> I like the C++ solution to this problem:
> 
> The comparison operators (>, >=, ==, etc) define the natural sort order.
> Cross-comparing objects for which there is no natural order (such as
> comparing strings to ints, or comparing a complex number to another complex
> number) will cause a failure at compile.
> 
> However, it is also possible to define a template function less<>
> which defines a total ordering, and is used by default by the set<> and map<>
> containers (which have the same requirements as our BTrees). There is no
> requirement that this ordering be a sensible one since it is only used by
> data structures, not humans.
> 
> Im not sure how this maps onto a Python solution for BTrees.....
> 
> The easiest solution would be to change no code, and attach a big warning to
> the documentation saying that BTree keys must have a stable strict order.
> This would require changes to broken applications: Zope would need a user
> interface for choosing the type of objects for ZCatalog FieldIndex. Thomas'
> application would need to use fatter keys which include a type code (example
> below). However, C++ experience shows that many developers will take a long
> time to fully understand this requirement placed on them.
> 
> Im not sure Jims solution (defining a BTree-specific comparison function) can
> work without breaking nice characteristics such as the equivalence of:
> * btree[1]
> * btree[1.0]
> * btree[MyCustomLongIntegerType(1)]

My suggestion works just fine for this, since 1, 1.0 and, 
presumably MyCustomLongIntegerType(1) can be meaningfully
compared. Basically, we just need a comparison routine that
uses application-defined comparisons, which allow some cross-
type comparisons, without falling back to address comparisons.

Just to make sure we're clear here, I'm suggesting a comparison
excactly like Python's built-in comparison without the address comparison
fallback. It would use whatever type-specific comparison was provided.

> Thomas: your application could use keys which are 2-tuples. The first element
> is a code (string, integer, whatever) defining object classifications. You
> must then ensure that all objects within each classification have a stable
> total order. For example, keys could be ("FooClassification",FooObject(x))

This is an interesting work-around, especially if the actual ordering
isn't meaningful to the application.  Then again, if the ordering
isn't meaningful, a hash table would probably be a better choice, assuming
that someone could come up with a useful persistent hash table implementation.

For example, suppose that keys could have a consistent immutable hash value, 
then one might use a IOBtree, with hashs as keys and hash buckets as values.

Jim

--
Jim Fulton           mailto:jim@zope.com       Python Powered!        
CTO                  (888) 344-4332            http://www.python.org  
Zope Corporation     http://www.zope.com       http://www.zope.org