[ZODB-Dev] What makes the ZODB slow?

Tim Peters tim.peters at gmail.com
Mon Jun 26 17:28:51 EDT 2006


[Dieter Maurer]
>>> The newest pickle formats can also handle the class references
>>> is bit more efficiently -- at least when a single transaction
>>> modifies many objects of the same class.

[Chris Withers]
>> I know ZC was involved in the work to introduce these new pickle
>> formats, but are they actually used in ZODB yet?

[Dieter]
> I think the optimization I refered to (if you have several instances
> of the same class in a pickle, then they all share a single
> instance of the class name and reference this one) is already
> used.

I'm not sure what you have in mind there.  It's always been true that
pickle was _able_ to reuse common bits, but this is effectively
disabled in ZODB on a cross-persistent-object basis by (from a recent
serialize.py):

    def _dump(self, classmeta, state):
        # To reuse the existing cStringIO object, we must reset
        # the file position to 0 and truncate the file after the
        # new pickle is written.
        self._file.seek(0)
        self._p.clear_memo()
        self._p.dump(classmeta)
        self._p.dump(state)
        self._file.truncate()
        return self._file.getvalue()

The "self._p.clear_memo()" there makes the pickler forget everything
it's done, so that the pickle for a persistent object is
self-contained.

For example, if you store an OOBTree whose internal state contans 100
OOBuckets, the string "BTrees._OOBucket" appears 100 times in the data
record, and string "OOBTree" even more.  Jeremy once analyzed a
customer Data.fs and incidentally discovered that about half the space
was consumed by repetitions of such BTree-related strings; no idea
whether that's typical, although I wouldn't be surprised if it were.

An entirely new gimmick was introduced in pickle protocol 2, the
"extension registry" described in PEP 307:

    http://www.python.org/dev/peps/pep-0307/

That _allows_ an application to register "popular" module and class
string names that pickles can reference later via teensy 2- or 3-byte
(independent of string length) opcodes.  In effect, such strings are
stored in the _implementation_ of pickle instead of inside pickles.

AFAIK, nobody anywhere has used this yet, outside of Python's test
suite.  It was intended to be a simple, cheap approach to cutting
pickle bloat for apps motivated enough to set up the registry.  You'll
note that half the one-byte codes are reserved for Zope :-)

> You mean an optimization to make the pickle size for some
> new style classes smaller. That's not yet used because it could
> make the storage exchange between different Python versions impossible
> (the older Python versions would not understand the new pickle protocol).

The need for protocol 2 is also why the extension registry (above)
can't be used so long as older Pythons are in the mix.


More information about the ZODB-Dev mailing list