[ZODB-Dev] Changing the pickle protocol?

Hanno Schlichting hanno at hannosch.eu
Wed Apr 28 11:43:42 EDT 2010


On Wed, Apr 28, 2010 at 5:11 PM, Jim Fulton <jim at zope.com> wrote:
> Do you know of specific benefits you expect from protocol 2? Any
> specific reasons
> you think it would be better in practice?

I have just seen some ongoing work on pickles in recent times, for
example from the Python 2.7 what's new:

- The pickle and cPickle modules now automatically intern the strings
used for attribute names, reducing memory usage of the objects
resulting from unpickling. (Contributed by Jake McGuire; issue 5084.)

- The cPickle module now special-cases dictionaries, nearly halving
the time required to pickle them. (Contributed by Collin Winter; issue
5670.)

Unless I've misread the code, these changes only apply to protocol
two. And then there's the old claims of pep 307 stating that pickling
new-style classes would be more efficient.

Finally Python 3 introduces pickle protocol version 3, which deals
explicitly with the new bytes type. There's more changes in Python 3
and the pickle format, so that's a separate project. But it suggested
to me, that the pickle format isn't quite as "dead" anymore as it used
to be.

> I've avoided going to protocol 2 for two reasons:
>
> - It wasn't clear we'd get a benefit without deeper changes.
>  Those deeper changed might be of value, but only if we're
>  careful about how we make them.
>
>  In particular, we could replace class names in pickles
>  if we has a registry mapping ints to class names.
>  This could provide a number of benefits beyond
>  smaller pickles, but it needs some thought to get right.

Right. I'm not particular interested in the pickle class registry.
Having a hard dependency between code filling the registry and the
actual data has all sorts of implications. I don't really want to go
there myself.

> - I want zope.xmlpickle to work with ZODB database records and
>  it doesn't support protocol 2 yet.  This doesn't have to block
>  moving to protocol 2, but I really would like to have this work
>  if possible.

Ok. I know there's some tools reading the zodb data on their own,
without actually using the API's. I wouldn't want to break them, if
there's no clear benefit.

> I'm skeptical that there would be enough benefit for protocol 2 without
> implementing a registry to take advantage of integer pickle codes.
>
> The other benefit of protocol 2 has to do with the way instance pickles are
> constructed and, for persistent objects, ZODB takes a very different
> approach anyway.
>
> I suggest doing some realistic experiments to look at the impact of the
> change.
>
> - Convert an interesting Zope 2 database from protocol 1 to protocol 2.
>  How does this affect database size?
>
> - Do some sort of write and read benchmarks using the 2 protocols to
>  see if there's a meaningful benefit.

Ok, thanks. That gives me enough direction to work on some specific benchmarks.

Hanno


More information about the ZODB-Dev mailing list