[ZODB-Dev] Hanging ZEO-client hangs all other ZEO-clients?

Tim Peters tim at zope.com
Thu Apr 14 15:23:28 EDT 2005


[Chris Withers]
>> Out of interest, why are you using DirectoryStorage?

[Dario Lopez-Kästen]
> I chose it for several reasons:

I don't want to talk you out of it, but since this a general list I feel
compelled <wink> to respond to these points wrt current FileStorage.  You're

using a by-now very old Zope (2.6.2), and may not be aware of the info at:

    http://zope.org/Wikis/ZODB/FileStorageBackup

> 1) we are storing large amounts of binary files (PDF, Word, Matlab, Zip,
> tar-balls, etc) in this particular application (it's a student portal,
> course admin portal and an LMS). While we are not yet in the
> multigigabyte realm, we are storing archive copies of all the previous
> year's materials, which will eventually grow to be a lot of stuff.

If I understand correctly, DirectoryStorage and FileStorage both store this
stuff in giant pickles -- and then there's no cause for "large" total size
difference I'm aware of.  The storage comparison matrix at

    http://cvs.zope.org/ZODB3/Doc/storages.html?rev=1

says DirectoryStorage requires "Roughly 30% more [disk] space than Data.fs",
not less disk space.  Indeed, it's hard to imagine any non-compressing
scheme that could require less total disk space than FileStorage.

> 2) There is the issue of huge Data.fs fiels and making daily backups. We
> need to have incremental backups

See the link above:  repozo.py supports incremental Data.fs backup, taking
(using -Q) time roughly proportional to the increase in Data.fs size since
the most recent backup.  It goes fast!

> 3) HA - while DirStor is not a HA-tool per se, it provides the necessary
> tools for building something that provide some aspects of HA, ie. the
> replication features, etc.

Unsure what "HA" means to you.  "High availability", perhaps?  ZRS is
available for FileStorage, but it's admittedly not free:

    http://www.zope.com/Products/ZRS.html
 
> 4) Maintenance. While I have not yet dared to pack the DB, the mere size
> of the database will make packing a non-trivial operation memorywise in
> FielStorage. DirStor does not have the same memory requirements when
> packing.

The size of the objects in the database has little to do with memory
consumed by a FileStorage pack; it's more the number of distinct object
revisions at work, since an in-memory object reachability graph is
constructed.  I'm not sure how DirectoryStorage could perform packing
without constructing a similar reachability graph (Toby?).

The last time Jeremy and I watched a pack work on a 20GB Data.fs, on a very
slow Solaris box, we noticed that it was only taking 10-20% of the RAM, and
regretted the then-last round of packing changes, which favored reducing RAM
usage at the cost of increasing runtime.  That appears to have been a wrong
tradeoff for most modern boxes.

Then again, data storages are growing ever bigger too.  It's very nice that
DirectoryStorage's direct RAM consumption is independent of the number of
objects.

> 5) POSKeyErrors. We where getting quite a few of those, and that scared
> me. with DirStor, I do not see them as much as before.

Do you see _any_?

FWIW, several nasty causes (bugs in ZODB and Zope) for POSKeyErrors have
been fixed since Zope 2.6.2, and reports of POSKeyErrors from current
Zope/ZODB installations are conspicuous by absence.

Toby, I know (or think I know <wink>) that DirectoryStorage won't commit a
transaction containing dangling references.  I think that's great, and I'd
like (if possible) to introduce such a check at a higher level, so that all
storages would benefit.  Does DirectoryStorage do something beyond that
check specifically aimed at preventing POSKeyErrors?  

...



More information about the ZODB-Dev mailing list