[ZODB-Dev] B-Tree Concurrency Issue (_OOBTree.pyd segfaults)

Tim Peters tim at zope.com
Fri Apr 15 14:18:27 EDT 2005


[Gfeller Martin]
> We're using ZOPE 2.7.3 with its default Python, ZEO, and ZODB versions
> under Windows 2000 Server SP3. This is a 2xXeon machine, but Python is
> bound to a single CPU.

So that's ZODB 3.2.4.  I don't believe any _relevant_ ZODB bugs were fixed
since then (ZODB 3.2.7 will be current in a few weeks).

> One of our(non-data.fs) ZODBs consists of a OOBTree with about 50,000
> well-ordered tuple keys and Persistence.Persistent object values.
>
> In production, we got repeatably, but so far not reproducably, a memory
> access fault in _OOBTree.pyd+x4f93:

It's unclear how these two paragraphs are connected.  For example, do you
believe you're accessing the specific BTree you talked about when the memory
fault occurs (you never mention that BTree again, so it's hard to guess)?

Can you say something about how this tree is used?  For example, do you read
and write it from multiple ZEO clients?  Are you using ZEO at all?  Are you
(as you do below in code snippets) trying to access it from multiple threads
in a single process but _not_ letting each thread have its own Connection?

> ... [assembly dump snipped] ...

Sorry, machine-code dumps don't help me either.

> In order to narrow this down (while not speaking C), I try (on my single
> CPU machine) to load the root in a single thread as,
>
>    for x in conn.root().keys(): y=x.somedata

Is that _exactly_ the code you ran?  If not, could you post the exact code
you ran instead?

What does the root object have to do (if anything) with the BTree you talked
about above?

> while at the same time repeatedly checking the tree in a different thread
> but using the same connection (as Jim confirms in the mail cited below
> that this shold be ok):

Sorry, without seeing every line of code, I can't assume it's OK.  If I had
every line, I'm still not sure I could say:  despite what Jim said
second-hand in that 4-year-old msg, the only model we develop for, or test
against, is one-to-many mapping between threads and Connections, not
many-to-one.

When Jim said:

  Multiple threads *can* share a single connection, but
   if you do this, you'll need to:

     - Perform whatever locking is required to serialize
       access to the shared objects,

then the penalty for not doing "whatever locking is required" may indeed be
segfaults.  That's unfortunate, but that's how it is.


>    conn.root()._check()

The root object is a PersistentMapping, and doesn't have a ._check() method.
Like so:

>>> db = ZODB.DB(st)
>>> cn = db.open()
>>> cn.root()._check()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: _check

What did you really do?

> I repeatably get either a RunTime error 'the bucket being iterated
> changed size' in the for loop, OR a 'Bucket length < 1' assertion in the
> _check. After the loop finishes, the tree _check() is ok (it also passes
> all tests in Btrees.check.check()). The symptoms are the same, where I
> run under ZEO or directly with FileStorage.

I'm still confused about what you're checking -- BTrees.check.check() should
also complain if you pass the root object:

>>> from BTrees.check import check
>>> check(rt)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\code\zodb3.2\BTrees\check.py", line 412, in check
    Checker(btree).check()
  File "C:\code\zodb3.2\BTrees\check.py", line 323, in check
    self.walk()
  File "C:\code\zodb3.2\BTrees\check.py", line 258, in walk
    kind, is_mapping = classify(obj)
  File "C:\code\zodb3.2\BTrees\check.py", line 76, in classify
    return _type2kind[type(obj)]
KeyError: <extension class Persistence.PersistentMapping at 00988020>

> I replaced conn.root().keys() by list(conn.root().keys()) and get the
> same behavior as above, i.e., either the RunTime error or the transient
> assertion failure.

For a PersistentMapping, those both return lists.

> Reading the multi-threading ZODB dicussions in
> http://mail.python.org/pipermail/python-list/2001-February/030675.html, I
> assume that the above behavior is incorrect, as there are no writes to
> any object, no commit's and no conflict errors.

As above, that message didn't actually define anything about what "whatever
locking is required" means.  I can't define it myself, at least not without
a very long time staring at C code that wasn't designed to be used in this
way.

> Reading the discussion on the RunTime error in [ZODB-Dev] Re: BTrees q
> [Fwd: [Zope-dev] More Transience weirdness in 2.7.1b1]
> (http://mail.zope.org/pipermail/zodb-dev/2004-June/007459.html), I get
> the impression that the segfault and the symptoms described above might
> be related, perhaps the segfault being in an area where Tim's "required
> invariant for sane operation" is not being checked.

IIRC, the true cause of that was eventually determined in comment number
32(!) on collector issue 1350:

    http://www.zope.org/Collectors/Zope/1350

It was indeed pinned on multiple threads mucking with the same in-memory
persistent object simultaneously.

> Of course, the Python crash is what bothers us (as I said, it's a bank
> site using Quantax) - RunTime errors we can always try around... In that
> sense, any help would be enormously appreciated.

As above, it would help a lot to know what the BTree you started talking
about has to do with this, and to get a more realistic account of what you
tried to do (most places you showed code using the root object, the root
object can't actually be used in those ways).

It would also help if you clarified whether or not your actual app (as
opposed to your testing snippets) tries to use objects loaded from a single
Connection in more than one thread.  If it's true that your app does that,
you _may_ not get an answer here more satisfying than "don't do that".  But
so far, I really have no idea yet about what your app _is_ doing.



More information about the ZODB-Dev mailing list