[ZODB-Dev] FileStorage size/density survey

Greg Ward gward@mems-exchange.org
Tue, 17 Dec 2002 10:46:36 -0500


--YiEDa0DAkWCtVeE4
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi all -- I'm toying with the idea of implementing a new index type for
FileStorage as an alternative to both the in-memory Python dict
currently used, and the on-disk BTree planned for ZODB 3.2.  Right now
I'm drawing pictures and estimating memory use -- nice thing about
implementing data structures in C, you can make reasonable guesses about
memory use.  ;-)

AFAICT, the only variables that affect the size of the index are the
number of "slots" in a FileStorage (ie. highest OID used +1) and the
actual number of objects.  (Actually, the number of "slots" only matters
for a dead-simple direct-lookup array, which I'm only considering as a
hypothetical fast, simple, and memory-inefficient approach.)

I'd like to find out what these numbers are for a variety of real world
ZODB databases.  So could you please run the attached script on any ZODB
.fs files you happen to have lying around and mail me the results?  I'll
summarize to the list if anyone else is interested.

Thanks --

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org

--YiEDa0DAkWCtVeE4
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="zodb_density.py"

#!/usr/bin/env python

# Open a FileStorage and compute its density (number of OIDs used
# divided by number of OID "slots" (ie. max oid + 1)).

import sys, os
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
from ZODB.utils import p64, u64

def write (msg):
    sys.stdout.write(msg)
    sys.stdout.flush()

args = sys.argv[1:]
if len(args) < 1:
    sys.exit("usage: %s fsname [...]" % os.path.basename(sys.argv[0]))

for fsname in args:
    write(fsname + ": ")
    fs = FileStorage(fsname)

    num_objects = len(fs)
    write("%d objects" % num_objects)

    num_slots = u64(max(fs._index.keys())) + 1
    write(", %d slots, density=%g\n"
          % (num_slots, float(num_objects)/num_slots))

    fs.close()

--YiEDa0DAkWCtVeE4--