[ZODB-Dev] Odd output from checkbtrees.py

Tim Peters tim at zope.com
Tue Sep 2 17:08:23 EDT 2003


[Shane Hathaway]
>> DCWorkflow doesn't use BTrees.  My guess is that checkbtrees is
>> actually traversing a single object many times, making it a bug in
>> checkbtrees.

[Paul Winkler]
> Correct. And in fact, there were 2 issues to deal with:
> 1) obj.__dict__.items() sometimes picks things up
> through acquisition, as demonstrated by the problem with DCWorkflow
> transitions (and states, BTW).

Did you and Dieter resolve your disagreement about this?  Whether or not
it's possible, I don't think it should matter to solving the problem of
unbounded searching in checkbtrees.py.

> 2) it's possible for an object to contain a reference to itself,
> e.g. a LocalFS instance has a root attribute that (sometimes?
> always?) points to itself. So checkbtrees.py was infinitely
> descending into spam.root.root.root.root...

This problem seems central to me, and is worse than just objects directly
referring to themselves.  For example, this little script:

"""
import ZODB
from ZODB import FileStorage, DB
from BTrees.OOBTree import OOBTree

storage = FileStorage.FileStorage('BTree.fs')
db = DB(storage)
conn = db.open()
root = conn.root()

t = root['tree'] = OOBTree()
for i in range(20):
    t[str(i)] = i
get_transaction().commit()

u = OOBTree()
t['child'] = u
get_transaction().commit()

u['parent'] = t
get_transaction().commit()

db.close()
"""

builds a pair of BTrees (t and u), neither of which references itself, but
both of which reference the other.  This also tricks checkbtrees.py into
producing unbounded output, and is still a very simple form of what *can* go
wrong.  A full solution requires remembering every object reachable from the
root, never visiting an object a second time.

The attached implements that, and also strengthens checkbtrees.py by calling
BTrees.check.check() on the btrees it finds (BTrees.check.check() didn't
exist at the time checkbtrees.py was written, and finds some kinds of BTree
corruption the BTree._check() method can't find).

I'd sure appreciate it if you gave it a try on your problematic .fs file!
If acquistion still seems to be creating some kind of problem, I'd like to
understand that better -- but nothing I saw up to this point made it clear
that it wasn't *just* inadequate cycle detection giving you problems.  The
main thrust of the attached is to do thorough cycle detection.
-------------- next part --------------
#! /usr/bin/env python
"""Check the consistency of BTrees in a Data.fs

usage: checkbtrees.py data.fs

Try to find all the BTrees in a Data.fs, call their _check() methods,
and run them through BTrees.check.check().
"""

from types import IntType

import ZODB
from ZODB.FileStorage import FileStorage
from BTrees.check import check

# Set of oids we've already visited.  Since the object structure is
# a general graph, this is needed to prevent unbounded paths in the
# presence of cycles.  It's also helpful in eliminating redundant
# checking when a BTree is pointed to by many objects.
oids_seen = {}

# Append (obj, path) to L if and only if obj is a persistent object
# and we haven't seen it before.
def add_if_new_persistent(L, obj, path):
    global oids_seen

    getattr(obj, '_', None) # unghostify
    if hasattr(obj, '_p_oid'):
        oid = obj._p_oid
        if not oids_seen.has_key(oid):
            L.append((obj, path))
            oids_seen[oid] = 1

def get_subobjects(obj):
    getattr(obj, '_', None) # unghostify
    sub = []
    try:
        attrs = obj.__dict__.items()
    except AttributeError:
        attrs = ()
    for pair in attrs:
        sub.append(pair)

    # what if it is a mapping?
    try:
        items = obj.items()
    except AttributeError:
        items = ()
    for k, v in items:
        if not isinstance(k, IntType):
            sub.append(("<key>", k))
        if not isinstance(v, IntType):
            sub.append(("[%s]" % repr(k), v))

    # what if it is a sequence?
    i = 0
    while 1:
        try:
            elt = obj[i]
        except:
            break
        sub.append(("[%d]" % i, elt))
        i += 1

    return sub

def main(fname):
    fs = FileStorage(fname, read_only=1)
    cn = ZODB.DB(fs).open()
    rt = cn.root()
    todo = []
    add_if_new_persistent(todo, rt, '')

    found = 0
    while todo:
        obj, path = todo.pop(0)
        found += 1
        if not path:
            print "<root>", repr(obj)
        else:
            print path, repr(obj)

        mod = str(obj.__class__.__module__)
        if mod.startswith("BTrees"):
            if hasattr(obj, "_check"):
                try:
                    obj._check()
                except AssertionError, msg:
                    print "*" * 60
                    print msg
                    print "*" * 60

                try:
                    check(obj)
                except AssertionError, msg:
                    print "*" * 60
                    print msg
                    print "*" * 60

        if found % 100 == 0:
            cn.cacheMinimize()

        for k, v in get_subobjects(obj):
            if k.startswith('['):
                # getitem
                newpath = "%s%s" % (path, k)
            else:
                newpath = "%s.%s" % (path, k)
            add_if_new_persistent(todo, v, newpath)

    print "total", len(fs._index), "found", found

if __name__ == "__main__":
    import sys
    try:
        fname, = sys.argv[1:]
    except:
        print __doc__
        sys.exit(2)

    main(fname)


More information about the ZODB-Dev mailing list