[ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN
Pedro Ferreira
jose.pedro.ferreira at cern.ch
Tue May 26 06:00:51 EDT 2009
Jim Fulton wrote:
>
> On May 25, 2009, at 9:23 AM, Pedro Ferreira wrote:
>
>> Dear Andreas, Marius,
>>
>>> This means that you're using ZEO, right? Have you tried to use strace
>>> to see what it's doing? Is it using any CPU time?
>>>
>>>
>> Yes, we're using ZEO.
>> It's doing a lot of lseek() and read() calls, i.e.:
>>
>> read(6, "eq\7cdatetime\ndatetime\nq\10(U\n\7\326\t\r\f"..., 4096) = 4096
>> lseek(6, 3156590592, SEEK_SET) = 3156590592
>> lseek(6, 3156590592, SEEK_SET) = 3156590592
>> lseek(6, 3156590592, SEEK_SET) = 3156590592
>> lseek(6, 3156590592, SEEK_SET) = 3156590592
>> read(6, "\n_authorGenq9(U\10\0\0\0\0\3\v\375\367q:h\5tQU\t"..., 4096)
>> = 4096
>> lseek(6, 3156594688, SEEK_SET) = 3156594688
>> lseek(6, 3156594688, SEEK_SET) = 3156594688
>> lseek(6, 3156594688, SEEK_SET) = 3156594688
>> lseek(6, 3156594688, SEEK_SET) = 3156594688
>> lseek(6, 3156594688, SEEK_SET) = 3156594688
>> lseek(6, 3156594688, SEEK_SET) = 3156594688
>>
>> And the allocated memory grows up to ~200 MB, data.fs.index.index_tmp is
>> created, and then the process seems to die and restart (different PID).
>
> What is in the ZEO server log when this happens?
>
2009-05-25T09:14:47 (30942) opening storage '1' using FileStorage
2009-05-25T09:15:05 Ignoring index for /opt2/indico/db/data.fs
2009-05-25T09:31:15 (31017) opening storage '1' using FileStorage
2009-05-25T09:31:35 Ignoring index for /opt2/indico/db/data.fs
2009-05-25T09:47:38 (31056) opening storage '1' using FileStorage
2009-05-25T09:47:57 Ignoring index for /opt2/indico/db/data.fs
> ...
>
>> We noticed there was a problem when a pack failed (yesterday, around
>> 12:00 CET):
>>
>> Traceback (most recent call last):
>> File
>> "/opt/python24/lib/python2.4/site-packages/MaKaC/tools/packDB.py",
>> line 24, in ?
>> DBMgr.getInstance().pack(days=1)
>> File "/opt/python24/lib/python2.4/site-packages/MaKaC/common/db.py",
>> line 135, in pack
>> self._storage.pack(days=days)
>> File
>> "/opt/python24/lib/python2.4/site-packages/ZEO/ClientStorage.py",
>> line 865, in pack
>> return self._server.pack(t, wait)
>> File "/opt/python24/lib/python2.4/site-packages/ZEO/ServerStub.py",
>> line 161, in pack
>> self.rpc.call('pack', t, wait)
>> File
>> "/opt/python24/lib/python2.4/site-packages/ZEO/zrpc/connection.py",
>> line 536, in call
>> raise inst # error raised by server
>> RuntimeError: maximum recursion depth exceeded
>
> What was in the ZEO server log when this happened?
>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.213', 45138):
<ManagedServerConnection ('137.138.128.213', 45138)>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.213', 45139):
<ManagedServerConnection ('137.138.128.213', 45139)>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.212', 59721):
<ManagedServerConnection ('137.138.128.212', 59721)>
2009-05-24T12:22:54 (28965) new connection ('137.138.128.212', 59722):
<ManagedServerConnection ('137.138.128.212', 59722)>
2009-05-24T12:22:54 (137.138.128.213:45138) received handshake 'Z303'
2009-05-24T12:22:54 (28965) Error raised in delayed method
Traceback (most recent call last):
File "/usr/lib64/python2.3/site-packages/ZEO/StorageServer.py", line
1009, in run
result = self._method(*self._args)
File "/usr/lib64/python2.3/site-packages/ZEO/StorageServer.py", line
345, in _pack_impl
self.storage.pack(time, referencesf)
File
"/usr/lib64/python2.3/site-packages/ZODB/FileStorage/FileStorage.py",
line 1372, in pack
self._save_index()
File
"/usr/lib64/python2.3/site-packages/ZODB/FileStorage/FileStorage.py",
line 252, in _save_index
p.dump(info)
RuntimeError: maximum recursion depth exceeded
2009-05-24T12:22:54 (28965) new connection ('137.138.128.212', 59723):
<ManagedServerConnection ('137.138.128.212', 59723)>
2009-05-24T12:22:54 (137.138.128.213:45139) received handshake 'Z303'
> Also, how many objects are in your database?
>
Hmmm... I have no idea... is there an easy way of calculating that?
>> We were packing a 15GB (which normally results in a 6-7 GB) database.
>> So, we'll try increasing the recursion depth limit (maybe the process is
>> dying because it reaches the limit?).
>>
>> Silly question: is there any way to access (read-only) the data without
>> generating the index?
>
> Yes, if you open the file-storage read-only (in your ZEO conf, use
> "read-only true" in the filestorage section), an index file won't be
> written. An in memory index will still be created. I suspect the
> problem is occurring when writing the index to disk.
>
Good, this may be useful, if a problem like this ever happens again
(fingers crossed).
Thanks, once again,
Pedro
--
José Pedro Ferreira
(Software Developer, Indico Project)
IT-UDS-AVC
CERN
Geneva, Switzerland
Office 513-R-042
Tel. +41 22 76 77159
More information about the ZODB-Dev
mailing list