[Zope3-dev] Pickle statistics

Guido van Rossum guido@python.org
Tue, 04 Feb 2003 16:25:07 -0500


Zope pickles contain lots of references to class names (which include
module names and are hence relatively long strings).  Since each full
class name is present once in each pickle that contains one or more
instances of that class, this costs a lot of space.

For Python 2.3, I'm developing a new (incompatible) pickling protocol
that will support compressing popular class names to 2 or 3 bytes, by
assigning a short integer to each classname.  Read PEP 307 for more
information (look for the section on the extension registry towards
the end of the PEP).  http://www.python.org/peps/pep-0307.html

In order to tune this mechanism, I need statistics about which class
names occur most frequently in a typical Zope 3 storage. I've written
a small program that opens a filestorage, iterates over all the data
records, and collects the class names referenced by the pickles.  It
requires a current CVS Python 2.3.

Here's sample output for a Zope 3 storage after some very limited
interaction (I created two empty objects):

    18 zodb.btrees.OOBTree OOBTree
    13 persistence.dict PersistentDict
     5 zope.app.services.service ServiceConfiguration
     5 zope.app.services.configuration ConfigurationRegistry
     5 zodb.btrees.IOBTree IOBTree
     4 zope.app.services.type PersistentTypeRegistry
     3 zope.app.services.service ServiceManager
     3 zope.app.content.zpt ZPTPage
     3 zope.app.content.folder RootFolder
     2 zope.app.services.package Packages
     2 zope.app.services.package Package
     2 zope.app.services.hub ObjectHub
     2 zope.app.services.event EventService
     2 zope.app.services.errorr ErrorReportingService
     2 zope.app.services.configurationmanager ConfigurationManager
     2 zope.app.content.file FileChunk
     2 zope.app.content.file File
     2 zodb.btrees.OIBTree OIBTree
    77 total, 18 unique

>From this I conclude that zodb.btrees.OOBTree.OOBTree and
persistence.dict.PersistentDict are good candidates for an extension
code.

I don't have a big Zope 3 application, but I know some people here do.
If you do, would you be so kind to run this program over your
application and mail me the output?  That may help me decide which
extension codes to assign for Zope 3.  (The PEP explains why extension
codes need to be globally defined or at least standardized for one
application.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

import os
import sys
import time
import pickle
import pickletools
from StringIO import StringIO

if __name__ == "__main__":
    # Tweak sys.path
    here = os.path.abspath(os.path.dirname(sys.argv[0]))
    srcdir = os.path.abspath("src")
    sys.path = [srcdir, here] + filter(None, sys.path)

from zodb.storage.file import FileStorage
from zodb.utils import u64
from zodb.timestamp import TimeStamp

def main():
    if sys.argv[1:]:
        fn = sys.argv[1]
    else:
        fn = "Data.fs"
    fs = FileStorage(fn, read_only=True)
    it = fs.iterator()
    names = {}
    while True:
        try:
            txnrec = it.next()
        except IndexError:
            break
        while True:
            try:
                r = txnrec.next()
            except IndexError:
                break
            assert r.serial == txnrec.tid
            data = r.data
            dump(data, names)
    vk = [(v, k) for k, v in names.iteritems()]
    vk.sort()
    vk.reverse()
    count = 0
    for v, k in vk:
        print "%6d %s" % (v, k)
        count += v
    print "%6d total, %d unique" % (count, len(vk))

def dump(data, names):
    f = StringIO(data)
    for i in 1, 2:
        for opcode, arg, pos in pickletools.genops(f):
            if opcode.code == pickle.GLOBAL:
                if arg in names:
                    names[arg] += 1
                else:
                    names[arg] = 1

if __name__ == "__main__":
    main()