[ZODB-Dev] B-Tree Concurrency Issue (_OOBTree.pyd segfaults)

Gfeller Martin Martin.Gfeller at comit.ch
Fri Apr 15 09:07:35 EDT 2005


Dear all,

We're using ZOPE 2.7.3 with its default Python, ZEO, and ZODB versions under Windows 2000 Server SP3. This is a 2xXeon machine, but Python is bound to a single CPU. 

One of our(non-data.fs) ZODBs consists of a OOBTree with about 50,000 well-ordered tuple keys and Persistence.Persistent object values.

In production, we got repeatably, but so far not reproducably, a memory access fault in _OOBTree.pyd+x4f93:


eax=00000000 ebx=00000000 ecx=0bffb9c0 edx=00000000 esi=00000000 edi=1667dcb0
eip=01614f93 esp=099cd768 ebp=099cd78c iopl=0         nv up ei pl zr na po
nc
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000246


function: <nosymbols>
        01614f78 8b55e4           mov     edx,[ebp+0xe4]         ss:0a44ad5e=????????
        01614f7b 8955e0           mov     [ebp+0xe0],edx         ss:0a44ad5e=????????
        01614f7e eb02             jmp     01623a82
        01614f80 eb02             jmp     01623a84
        01614f82 eba1             jmp     0161db25
        01614f84 8b45e4           mov     eax,[ebp+0xe4]         ss:0a44ad5e=????????
        01614f87 8945f0           mov     [ebp+0xf0],eax         ss:0a44ad5e=????????
        01614f8a 8b4d08           mov     ecx,[ebp+0x8]          ss:0a44ad5e=????????
        01614f8d 8b5134           mov     edx,[ecx+0x34]         ds:0ca78f92=????????
        01614f90 8b45f0           mov     eax,[ebp+0xf0]         ss:0a44ad5e=????????
FAULT ->01614f93 8b4cc204         mov     ecx,[edx+eax*8+0x4]    ds:00a7d5d3=????????
        01614f97 894dec           mov     [ebp+0xec],ecx         ss:0a44ad5e=????????
        01614f9a 33d2             xor     edx,edx
        01614f9c 837d1000         cmp   dword ptr [ebp+0x10],0x0 ss:0a44ad5e=????????
        01614fa0 0f95c2           setne   dl
        01614fa3 8b4510           mov     eax,[ebp+0x10]         ss:0a44ad5e=????????
        01614fa6 03c2             add     eax,edx
        01614fa8 894510           mov     [ebp+0x10],eax         ss:0a44ad5e=????????
        01614fab 8b4d08           mov     ecx,[ebp+0x8]          ss:0a44ad5e=????????
        01614fae 8b55ec           mov     edx,[ebp+0xec]         ss:0a44ad5e=????????
        01614fb1 8b4104           mov     eax,[ecx+0x4]          ds:0ca78f92=????????
        01614fb4 3b4204           cmp     eax,[edx+0x4]          ds:00a7d5d2=????????

In order to narrow this down (while not speaking C), I try (on my single CPU machine) to load the root in a single thread as,

   for x in conn.root().keys(): y=x.somedata

while at the same time repeatedly checking the tree in a different thread but using the same connection (as Jim confirms in the mail cited below that this shold be ok):

   conn.root()._check()

I repeatably get either a RunTime error 'the bucket being iterated changed size' in the for loop, OR a 'Bucket length < 1' assertion in the _check. After the loop finishes, the tree _check() is ok (it also passes all tests in Btrees.check.check()). The symptoms are the same, where I run under ZEO or directly with FileStorage.

I replaced conn.root().keys() by list(conn.root().keys()) and get the same behavior as above, i.e., either the RunTime error or the transient assertion failure.

Reading the multi-threading ZODB dicussions in http://mail.python.org/pipermail/python-list/2001-February/030675.html, I assume that the above behavior is incorrect, as there are no writes to any object, no commit's and no conflict errors.

Reading the discussion on the RunTime error in [ZODB-Dev] Re: BTrees q [Fwd: [Zope-dev] More Transience weirdness in 2.7.1b1]  (http://mail.zope.org/pipermail/zodb-dev/2004-June/007459.html), I get the impression that the segfault and the symptoms described above might be related, perhaps the segfault being in an area where Tim's "required invariant for sane operation" is not being checked. 

Of course, the Python crash is what bothers us (as I said, it's a bank site using Quantax) - RunTime errors we can always try around... 
In that sense, any help would be enormously appreciated.

Best regards,
Martin Gfeller

________________________


COMIT AG
Risk Management Systems
Pflanzschulstrasse 7 
CH-8004 Zürich 

Telefon	+41 44 298 92 84 

http://www.comit.ch 
http://www.quantax.com - Quantax Trading and Risk System


More information about the ZODB-Dev mailing list