[ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN

Pedro Ferreira jose.pedro.ferreira at cern.ch
Tue May 26 06:00:51 EDT 2009


Jim Fulton wrote:
>
> On May 25, 2009, at 9:23 AM, Pedro Ferreira wrote:
>
>> Dear Andreas, Marius,
>>
>>> This means that you're using ZEO, right?  Have you tried to use strace
>>> to see what it's doing?  Is it using any CPU time?
>>>
>>>
>> Yes, we're using ZEO.
>> It's doing a lot of lseek() and read() calls, i.e.:
>>
>> read(6, "eq\7cdatetime\ndatetime\nq\10(U\n\7\326\t\r\f"..., 4096) = 4096
>> lseek(6, 3156590592, SEEK_SET)          = 3156590592
>> lseek(6, 3156590592, SEEK_SET)          = 3156590592
>> lseek(6, 3156590592, SEEK_SET)          = 3156590592
>> lseek(6, 3156590592, SEEK_SET)          = 3156590592
>> read(6, "\n_authorGenq9(U\10\0\0\0\0\3\v\375\367q:h\5tQU\t"..., 4096)
>> = 4096
>> lseek(6, 3156594688, SEEK_SET)          = 3156594688
>> lseek(6, 3156594688, SEEK_SET)          = 3156594688
>> lseek(6, 3156594688, SEEK_SET)          = 3156594688
>> lseek(6, 3156594688, SEEK_SET)          = 3156594688
>> lseek(6, 3156594688, SEEK_SET)          = 3156594688
>> lseek(6, 3156594688, SEEK_SET)          = 3156594688
>>
>> And the allocated memory grows up to ~200 MB, data.fs.index.index_tmp is
>> created, and then the process seems to die and restart (different PID).
>
> What is in the ZEO server log when this happens?
>
2009-05-25T09:14:47 (30942) opening storage '1' using FileStorage
2009-05-25T09:15:05 Ignoring index for /opt2/indico/db/data.fs
2009-05-25T09:31:15 (31017) opening storage '1' using FileStorage
2009-05-25T09:31:35 Ignoring index for /opt2/indico/db/data.fs
2009-05-25T09:47:38 (31056) opening storage '1' using FileStorage
2009-05-25T09:47:57 Ignoring index for /opt2/indico/db/data.fs

> ...
>
>> We noticed there was a problem when a pack failed (yesterday, around
>> 12:00 CET):
>>
>> Traceback (most recent call last):
>>  File
>> "/opt/python24/lib/python2.4/site-packages/MaKaC/tools/packDB.py",
>> line 24, in ?
>>    DBMgr.getInstance().pack(days=1)
>>  File "/opt/python24/lib/python2.4/site-packages/MaKaC/common/db.py",
>> line 135, in pack
>>    self._storage.pack(days=days)
>>  File
>> "/opt/python24/lib/python2.4/site-packages/ZEO/ClientStorage.py",
>> line 865, in pack
>>    return self._server.pack(t, wait)
>>  File "/opt/python24/lib/python2.4/site-packages/ZEO/ServerStub.py",
>> line 161, in pack
>>    self.rpc.call('pack', t, wait)
>>  File
>> "/opt/python24/lib/python2.4/site-packages/ZEO/zrpc/connection.py",
>> line 536, in call
>>    raise inst # error raised by server
>> RuntimeError: maximum recursion depth exceeded
>
> What was in the ZEO server log when this happened?
>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.213', 45138):
<ManagedServerConnection ('137.138.128.213', 45138)>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.213', 45139):
<ManagedServerConnection ('137.138.128.213', 45139)>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.212', 59721):
<ManagedServerConnection ('137.138.128.212', 59721)>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.212', 59722):
<ManagedServerConnection ('137.138.128.212', 59722)>
2009-05-24T12:22:54 (137.138.128.213:45138) received handshake 'Z303'
2009-05-24T12:22:54 (28965) Error raised in delayed method
Traceback (most recent call last):
  File "/usr/lib64/python2.3/site-packages/ZEO/StorageServer.py", line
1009, in run
    result = self._method(*self._args)
  File "/usr/lib64/python2.3/site-packages/ZEO/StorageServer.py", line
345, in _pack_impl
    self.storage.pack(time, referencesf)
  File
"/usr/lib64/python2.3/site-packages/ZODB/FileStorage/FileStorage.py",
line 1372, in pack
    self._save_index()
  File
"/usr/lib64/python2.3/site-packages/ZODB/FileStorage/FileStorage.py",
line 252, in _save_index
    p.dump(info)
RuntimeError: maximum recursion depth exceeded
2009-05-24T12:22:54 (28965) new connection ('137.138.128.212', 59723):
<ManagedServerConnection ('137.138.128.212', 59723)>
2009-05-24T12:22:54 (137.138.128.213:45139) received handshake 'Z303'

> Also, how many objects are in your database?
>
Hmmm... I have no idea... is there an easy way of calculating that?
>> We were packing a 15GB (which normally results in a 6-7 GB) database.
>> So, we'll try increasing the recursion depth limit (maybe the process is
>> dying because it reaches the limit?).
>>
>> Silly question: is there any way to access (read-only) the data without
>> generating the index?
>
> Yes, if you open the file-storage read-only (in your ZEO conf, use
> "read-only true" in the filestorage section), an index file won't be
> written.  An in memory index will still be created.  I suspect the
> problem is occurring when writing the index to disk.
>
Good, this may be useful, if a problem like this ever happens again
(fingers crossed).

Thanks, once again,

Pedro

-- 

José Pedro Ferreira
(Software Developer, Indico Project)

IT-UDS-AVC
CERN
Geneva, Switzerland

Office 513-R-042
Tel. +41 22 76 77159







More information about the ZODB-Dev mailing list