[ZODB-Dev] [BTrees] Inconsistent equality checks

Tim Peters tim at zope.com
Sun Nov 9 22:02:19 EST 2003


[Dieter Maurer]
> My main purpose for this thread has been to warn "BTrees" users
> about the surprising differences between "XXSet" and "XXTreeSet"
> when one tries to implement equality checks for sets implemented
> by "XXSet/XXTreeSet".

I don't know why it's so surprising.  The operations these things support
are spelled out in BTrees/Interfaces.py:

class ISet(IKeySequence, ISetMutable):
    pass

class ITreeSet(IKeyed, ISetMutable):
    pass

The difference between IKeySequence and IKeyed is real; the extent to which
they're guaranteed to act the same is captured by ISetMutable, and that
IKeySequence derives from IKeyed.  Use of anything beyond what the
interfaces promise is relying on accidents, and will eventually bite.

>   The standard Python "equality" operator implements (for historical
>   reasons, as you have explained) a semantically useless check.

Usefulness is relative to your goal, and sometimes comparison by object
identity is exactly what you want, other times not.  For example, the set of
people who work on ZODB contains different people from time to time, but in
a dict mapping work groups to their critical tasks, identity equality is
exactly what's needed.  __cmp__ isn't mentioned in Interfaces.py, though, so
relying on that OOTreeSets happen to compare by object identity today would
also be a mistake.

>   Thus, who needs set equality must look for alternatives.
>
>   One may try to implement equality of sets by equality of their
>   "keys()". This works for "XXSet" but fails for "XXTreeSet".

That's again no surprise to anyone who studies the interfaces.  They are
consistent in a deeper sense:  neither defines what comparison will do.

>   Some "BTrees" related documentation (I do not care enough to
>   to search it) suggests that "XXSet" and "XXTreeSet" are
>   both implementations of "Set" which can be used interchangeably --

They're both implementations of ISetMutable, and can be used interchangeably
(wrt semantics if not pragmatics) so long as you stick to the operations
defined by ISetMutable.  This isn't hidden <wink>.  It sounds like whatever
you docs you're talking about simplified in order to focus on the important
part:

>   one for smaller, the other for larger sets.

That was almost certainly the point those docs were trying to drive home.
TreeSets scale, Sets do not.

>   However, one has to be very careful when one replaces an
>   "XXSet" by an "XXTreeSet" (because the set size became larger
>   than is good for "XXSet").

The pragmatics are indeed very different, and Interfaces.py is (unwisely,
IMO), silent about pragmatics.  I'd be more cautious, though, when replacing
an XXTreeSet by an XXSet (because then you're replacing a data structure
which is friendly to persistent databases and has good worst-case insertion
and deletion behavior by one that has poor average-case insertion and
deletion behavior and needs to be entirely loaded into memory whenever any
part of it changes).

But if you're replacing one with the other because they have different
comparison behavior, you lose:  comparison behavior is undefined in both
cases, and you're just exchanging one unsupported accident for another then,
and in the next release both may change.




More information about the ZODB-Dev mailing list