[ZODB-Dev] Getting all OIDs from a storage.

Tim Peters tim.peters at gmail.com
Mon Apr 24 18:16:12 EDT 2006


[Christian Theune]
>> Hmm. Sorry, but could you point out where the API is defined? I might
>> not have looked hard enough. I only found internals to exploit. :(

[Jim Fulton]
> I wish I could.  I'm almost certain that Chris McDonough implemented
> one at PyCon 2005 and that Stephan Richter made use of this, but
> I can't find it.

He did, and it's described in NEWS.txt for ZODB 3.4a1:

"""
- Added a record iteration protocol to FileStorage.  You can use the
  record iterator to iterate over all current revisions of data
  pickles in the storage.

  In order to support calling via ZEO, we don't implement this as an
  actual iterator.  An example of using the record iterator protocol
  is as follows::

      storage = FileStorage('anexisting.fs')
      next_oid = None
      while True:
          oid, tid, data, next_oid = storage.record_iternext(next_oid)
          # do something with oid, tid and data
          if next_oid is None:
              break

  The behavior of the iteration protocol is now to iterate over all
  current records in the database in ascending oid order, although
  this is not a promise to do so in the future.
"""

I don't believe it was implemented for ZEO, or for anything else other
than FileStorage.

[Dieter Maurer]
> Are you aware that such an API would pose interesting
> concurrency issues?

Yes :-)

> If you do not lock the storage during the iteration,
> it is not unlikely the the iterated over set is
> modified during the interation. This poses interesting
> semantic questions (what precisely is the returned list?).

A subset (possibly total) of all oids of current revisions in the
FileStorage across the time record_iternext() is being used.  For the
use Stephan Richter made of it (a one-shot .fs conversion script,
without possibility for concurrent addition of new objects), this fit
the bill.

Each time FileStorage.record_iternext(next_oid) is called, it returns
the smallest oid in the database greater than next_oid (when next_oid
is not None), or the smallest oid in the database period (when
next_oid is None).  That's pretty robust, since new oids assigned by
FileStorage are monotonically increasing.

> On the other hand, you cannot lock the storage as the iteration is likely to
> be a very long running operation.

Right, and that's why an "incremental" approach was taken.


More information about the ZODB-Dev mailing list