[ZODB-Dev] Newbie ZODB Questions

Mon, 17 Sep 2001 10:34:54 -0700

Jennifer Flake wrote:
> 
> Hi There,
> 
> I have never touched Zope (so I apologize for my newbie questions), but
> my company is planning on using it in our next development project.  I
> would like to get information on the data storage and its maintenance
> requirements.  I hope that someone on the list can help me get answers
> to some questions about the care and feeding of ZODB.  I've been an
> Oracle DBA for the last few years, so most of my questions are concerns
> about storage, failure modes, loss of data, scalability, etc.

I'm an Oracle DBA as well, and have spent some time getting used to
the Zope mindset and the way the ZODB operates. There are several
features of the FileStorage that make it operate in a way that is
quite similar to an Oracle database in ARCHIVELOG mode.

> (1) In order to do a backup of the entire database, aside from Data.fs,
> what do I have to backup?

The ZODB is pretty much self-contained. Unlike Oracle, there are no
controlfiles to worry about. Your sysadmin is probably going to want
to preserve your startup and configuration files (sorta like pfiles)
but that's really just standard configuration management.

One qualification: see Steve Alexander's message about rebuilding a
corrupted ZODB from the index file:

	http://lists.zope.org/pipermail/zodb-dev/2001-September/001471.html

Given how much smaller the .index file is from the full Data.fs file,
it seems like cheap insurance to save the index file also. I haven't
needed such a failsafe yet, but it strikes me as a Really Good Idea.

> (3) I did a some reading and saw that the Data.fs is a file where
> transactions are appended.  Where does the rest of the object data exist
> and is that data in a platform independent format?  If not, is there a
> way to get the data out in platform-independent way?

Hmm. Probably useful to step back from the Oracle Way for a moment. In
Oracle, when a transaction updates a row somewhere, the specific
blocks changed are written to the log, along with their redo
information. The RDBMS then knows how to replay the redo info to
reconstruct the original row state.

In contrast, in Zope when an object is changed within a transaction
and the transaction committed, the entire object is appended to the
Data.fs file. I believe a "rollback" (transaction.abort) doesn't
actually touch the Data.fs file, because the transaction was not
committed, hence no objects were written. (ZODB-Gurus please help
out).

The object data itself is stored in "pickles", which are
platform-independent as far as I know.

> (4) Is there any way to perform adhoc queries on ZODB?

Not really in the sense you would with Oracle. However, once you get
the hang of referencing objects from the python command line in the
ZODB, you can do the same sorts of things. So far I've found it easier
to use the search interface or DTML to display objects than to use
python. There's an IDE I've been trying out that may change me over to
a more run-time style, but so far it's all DTML, all the time.

> (5) Is ZODB scalable? Would it be possible to store 10 gb of data
> without suffering performance issues?

There has been some discussion about the amount of memory needed to
index the data in Data.fs, and so it might depend more on how many
objects the 10gb actually represented than how big they are. If I were
trying to manage a large group of complex blobs (medical imaging data,
for example) I would probably use one of the ExternalFile products to
store the blobs in the filesystem, and manage the metadata within the
ZODB. If I needed to manage 10g worth of 10k objects, I would probably
think carefully about partitioning the problem so it could be
supported by parallel servers rather than one monolithic instance.

> (6) Is RAID the only way to implement redundancy for ZODB?

If by "redundancy" you mean for the data only, then I would agree that
currently the options available are raid-like, including clustered
storage like a NetApp. One thing to remember is that the other
redundancy available in Zope is the distribution and load-sharing
model of ZEO clients. Since each ZEO client maintains its own object
cache, losing your data server does not mean your service must be off
the air. 

> (7) Is there a way to replicate data from one ZODB to another?

Not like Oracle Replication. But at the same time, the FileStorage has
simple enough behavior that it doesn't really need something as
complicated as asyncrep. We're experimenting with something more like
a Standby Database model, with a "live" Zope Storage Server and a
standby one kept up-to-date using rsync.

> (8) Where do the application usernames/passwords physically exist in
> Zope?

They are stored as attributes of objects in the ZODB. I recommend the
EncryptedUserFolders patch. We've been using it in concert with the
CMF for a while now, with no noticeable problems.

> (11) If I am "reading" an object, do I continue to get a read-consistent
> view of it, even if someone else performs a write to it, while I am
> reading it?

There has been some discussion about reporting ReadConflictErrors, and
there are proposals for transaction isolation levels for the ZODB.
However, I am not sure how much of either is currently available.

> (12) If my server slams down from a power outage, will Zope be able to
> recover itself gracefully when I restart it?

For the most part, I believe this to be true. In addition, I believe
there are some utilities available to verify and modify a Data.fs
file, allowing for a couple different recovery strategies.

k1