[ZODB-Dev] URGENT: ZODB down - Important Software Application at CERN

Marius Gedminas marius at gedmin.as
Mon May 25 09:08:08 EDT 2009


On Mon, May 25, 2009 at 01:44:46PM +0200, Pedro Ferreira wrote:
> We're using ZODB for the Indico Project (at CERN), since 2004, without
> any kind of problem. However, today, our database went down and we can't
> find a way to recover it. This is a major issue, since we have ~4000
> users depending on this application, and we're simply not able to access
> the data in any way.

Ouch.

> Around 00:30 tonight the database went down, and since, all the
> connections are refused.

This means that you're using ZEO, right?  Have you tried to use strace
to see what it's doing?  Is it using any CPU time?

> We tried to restart the database, but the
> script seems to hang, while trying to create the index:
> 
> -rw-r--r--   1 root  root  6396734704 May 25 13:21 dataout.fs
> -rw-r--r--   1 root  root         173 May 25 12:21 dataout.fs.index
> -rw-r--r--   1 root  root   229755165 May 25 13:22
> dataout.fs.index.index_tmp
> -rw-r--r--   1 root  root           7 May 25 12:21 dataout.fs.lock
> -rw-r--r--   1 root  root    70956364 May 25 13:21 dataout.fs.tmp
> 
> We tried to do fsrecovery, but it says "0 bytes removed during
> recovery", and the result ends up being the same. We tried it in
> different machines, with no success. In one of them, after a while
> trying to create the index, a Python exception was thrown, saying
> "maximum recursion depth exceeded".

I'm not intimately familiar with the internals of ZODB.  If it's doing
object graph traversals recursively, and if your object graph is very
deep, you may mitigate this by calling, e.g.

  sys.setrecursionlimit(2 * sys.getrecursionlimit())

> We're using 3.4 in production, but we've tried with 3.9 and 3.9b0 as
> well, with no success.
> We're getting kind of desperate, since the same seems to happen with
> yesterday's backup, and trying to restore previous backups with repozo
> raises a CRC Error.
> Has anyone ever experienced this? Any clues on how to solve this problem?
> We'd really appreciate you could help us out, since this is becoming a
> big issue here at CERN (a lot of people's work depends on this).

Good luck!
Marius Gedminas
-- 
Linux: The Ultimate NT Service Pack
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mail.zope.org/pipermail/zodb-dev/attachments/20090525/94611c38/attachment.bin 


More information about the ZODB-Dev mailing list