[ZODB-Dev] more useful TreeSet

Tim Peters tim at zope.com
Mon Sep 22 21:48:00 EDT 2003


[John Belmonte]
> Shouldn't there be a wrapper for each TreeSet class to give it the
> same interface as a Set?  Namely I would like to iterate over a
> TreeSet (needs __getitem__) and have keys() return the actual keys as
> a list.

The operations supported by the BTree-based types are documented in
BTrees/Interfaces.py (in ZODB3).  As it says there,

    def keys(min=None, max=None):
        """Return an IReadSequence containing the keys in the collection

        The type of the IReadSequence is not specified. It could be a
        list or a tuple or some other type.

        If a min is specified, then output is constrained to
        items having keys greater than or equal to the given min.
        A min value of None is ignored.

        If a max is specified, then output is constrained to
        items having keys less than or equal to the given min.
        A max value of None is ignored.
        """

IOW, it's explicitly documented that the concrete result type of .keys()
isn't defined.  I don't expect this will ever change, since a TreeSet can be
enormous, and TreeSet.keys() has always-- and deliberately --returned a tiny
object implementing IReadSequence.  Materializing a list of keys can consume
unboundedly more memory.  If you know you don't care about the memory use,
and want a list, you can easily get one:

    as_a_list = list(someTreeSet_or_Set.keys())

list() is Python builtin, BTW.

The other difference is due to this (again from Interfaces.py):

class ISet(IKeySequence, ISetMutable):
    pass

class ITreeSet(IKeyed, ISetMutable):
    pass


Track those down, and you'll find that the IKeySequence interface derives
from the IKeyed interface, adding __getitem__:

class IKeySequence(IKeyed, ISized):

    def __getitem__(index):
        """Return the key in the given index position

        This allows iteration with for loops and use in functions,
        like map and list, that read sequences.
        """

Relief for *this* part is on the way eventually, but not yet in any version
of ZODB other than ZODB4.  So long as these types have to play nicely with
Python 2.1, making TreeSets directly iterable is too painful.  ZODB4
requires Python 2.2 (or later), and Python 2.2 introduced a new iteration
protocol that doesn't confuse iteration with __getitem__.  The BTree-derived
types in ZODB4 have already been extended to exploit that new Python
protocol, and you can iterate over all of them directly in ZODB4 (in
particular, "for x in some_TreeSet:" works fine in ZODB4).

Since ZODB3 3.2 will also require Python 2.2 +, if there's enough interest
these enhancements can be backported to that version of ZODB3 (although
that's not on my todo list today).

> Or in other words, shouldn't the fact that a TreeSet is implemented
> with multiple buckets just be an implementation detail?

If your view of the world is abstract enough, sure <wink>.  The difference
exists for solid pragmatic resaons:  TreeSets can grow to enormous size and
still be efficient (inserts and deletes take O(log(N)) time), but Sets are a
bad idea for large collections (inserts and deletes take O(N) time).
Set.keys() returns a list because that's most convenient for a small
collection; TreeSet.keys() returns a BTreeItems object because it may not
even be *possible* to materialize a full list of keys; if you pushed to make
them exactly the same, then because of that last point, they'd both end up
returning a BTreeItems object.




More information about the ZODB-Dev mailing list