[ZODB-Dev] Iterating over a Connection

Thu, 17 May 2001 16:29:29 -0400

Hi all --

in my endless curiosity about just what the heck lies buried in our ZODB
database, I've written a little script to hit every object in the
database and record how many objects of each type are seen.  Rather than
starting at the root and traversing the object graph, though, I decided
to just loop over all OIDs.  After running the script, there are some
things I think I don't understand about the Connection object and about
OIDs.  

The heart of the script is this:

    from struct import pack
    init_database()
    conn = get_connection()
    oid = 0L
    while 1:
        oid_s = pack(">LL", (oid & 0xffff0000) >> 16, (oid & 0x0000ffff))
        try:
            object = conn[oid_s]
        except KeyError:
            print "%016x  *empty slot*" % oid
        else:
            print "%016x  %s" % (oid, `object`)
        oid += 1

(init_database() opens the storage and database; get_connection()
returns the ZODB.Connection.Connection object.)

Obviously, this would loop forever.  But I'm not sure of the correct
semantics for when to stop.  If I don't catch KeyError, then the script
dies after visiting only 60 or 70 objects.  I guess the OIDs of deleted
objects are not reused, so attempting to access those objects by
dictionary lookup on the Connection object fails.  Fine, that makes
sense.  (But is my guess correct?)

So I fancied things up a bit:

    init_database()
    conn = get_connection()
    total_count = 0L                        # total objects seen
    expected_count = get_database().objectCount()
    print "expecting to see %d objects" % expected_count

    while 1:
        oid_s = pack(">LL", (oid & 0xffff0000) >> 16, (oid & 0x0000ffff))
        try:
            object = conn[oid_s]
        except KeyError:
            print "%016x  *empty slot*" % oid
        else:
            print "%016x  %s" % (oid, `object`)
            total_count += 1
            if total_count >= expected_count:
                break
        oid += 1

My assumption here is that when objectCount() returns 127702, there are
127702 *valid* OIDs in the database -- ie. OIDs that can be looked up in
the Connection object.  I think that assumption must be wrong, since I
let the script run while I went to catch up on python-dev.  Half an hour
later, it got this far:

00000000000d338d  *empty slot*
00000000000d338e  *empty slot*

(0xd338e is 865166, meaning we tried to lookup quite a lot more than
127702 OIDs in the Connection object.)

So my new assumption is this: objectCount() doesn't return the number of
"live" objects in the database, but instead the number of OIDs that have
ever had live objects attached to them.  Thus, I should stop when I have
*attempted* to load objectCount() objects, not when I have actually seen
that many objects.  I'm revising the script accordingly, but I would
like to know: is *this* assumption correct?  Or am I still groping for
the right way to iterate over a Connection object?

Thanks --

        Greg