[ZODB-Dev] Horizon for highly-available ZODB storage?

Thu, 03 Jan 2002 12:12:43 -0800

Happy New Year, all!

I wanted to query this list to get some input on what options are likely to
be available for maintaining a highly-available ZEO storage server within
the next 6-7 months.  My company has a very large, highly-demanding project
we are just getting started on that will heavily utilize multiple ZODB
instances on a ZEO cluster with a 2-box ZSS strategy.  We currently do this
for some less-demanding projects and attain high-availability across 2
concurrently running ZSS nodes using the Linux-HA project's Heartbeat
clustering software; our means of replication of our FileStorage Data.fs
from ZSS #1 to ZSS #2 is a simple daily file transfer (FTP) and restart of
the ZSS process; this is ok, since this data is updated usually just once
daily.  But our upcoming projects will involve bigger ODBs that update far
more often, so I wanted to get some input on what options are both available
now, and also what new strategies will likely be available later in the
upcoming year.

Specifically, I am wondering about three items in regards to a 2-box
cluster, with a primary ZSS and a hot-backup:

1 - Toby Dickenson's Replicated FileStorage (Available Now)
	http://www.zope.org/Members/htrd/ReplicatedFileStorage
2 - Standby Storage (Project Status?)
	http://www.zope.org/Wikis/ZODB/StandbyStorage
3 - DirectoryTreeStorage (Proposal) + InterMezzo FS (I'm dreaming, aren't
I?)
	http://dev.zope.org/Wikis/DevSite/Proposals/DirectoryTreeStorage
	http://www.inter-mezzo.org/
	My hunch would be that DirectoryTreeStorage could
	be designed with Intermezzo in mind for decent, 
	simple 1-way replication... in theory, of course.

I particularly like the IDEA of the 3rd (and most vaporous option), and have
the feeling that it could work, provided you clustering software restarted
the ZSS process, given problems with a few pickle files in a
DirectoryTreeStorage caused by an incomplete replication of files by
Intermezzo due to a machine fault would still be handled ok, at least if I
understand the implications of ChrisM's proposal: "If it finds evidence of a
failed transaction, it will revert any files it needs to within the
directory to their pre-transaction state by using the data in the log."

Of course I'm grounded in the reality that I need to eventually deploy a
solution in a production environment, but I'm interested in hearing some
thoughts, and perhaps sparking some discussion on this issue, as I imagine I
am not alone in the need for a solution.

Thanks,
Sean

=========================
Sean Upton
Site Technology Supervisor
Development & Integration
SignOnSanDiego.com
The San Diego Union-Tribune
619.718.5241
sean.upton@uniontrib.com
=========================