[ZODB-Dev] Changing the pickle protocol?

Jim Fulton jim at zope.com
Wed Apr 28 11:11:29 EDT 2010


On Wed, Apr 28, 2010 at 7:59 AM, Hanno Schlichting <hanno at hannosch.eu> wrote:
> Hi.
>
> The ZODB currently uses a hardcoded pickle protocol one. There's both
> the more efficient protocol two and in Python 3 protocol 3. Protocol
> two has seen various improvements in recent Python versions, triggered
> by its use in memcached.
>
> I'd be interested to work on changing the protocol. How should I approach this?

Do you know of specific benefits you expect from protocol 2? Any
specific reasons
you think it would be better in practice?

I've avoided going to protocol 2 for two reasons:

- It wasn't clear we'd get a benefit without deeper changes.
  Those deeper changed might be of value, but only if we're
  careful about how we make them.

  In particular, we could replace class names in pickles
  if we has a registry mapping ints to class names.
  This could provide a number of benefits beyond
  smaller pickles, but it needs some thought to get right.

- I want zope.xmlpickle to work with ZODB database records and
  it doesn't support protocol 2 yet.  This doesn't have to block
  moving to protocol 2, but I really would like to have this work
  if possible.


> I can see three general approaches:
>
> 1. Hardcode the version to 2 in all places, instead of one.
>
> Pros: Easy to do, backwards compatible with all supported Python versions
> Cons: Still inflexible
>
> 2. Make the protocol version configurable
>
> Pros: Give control to the user, one could change the protocol used for
> storages or persistent caches independently
> Cons: More overhead, different protocol versions could have different bugs
>
> 3. Make the format configurable
>
> Shane made a proposal in this direction at some point. This would
> abstract the persistent format and allow for different serialization
> formats. As part of this one could also have different Pickle/Protocol
> combinations.
>
> Pros: Lots of flexibility, it might be possible to access the data
> from different languages
> Cons: Even more overhead
>
>
> If I am to look into any of these options, which one should I look
> into? Option 1 is obviously the easiest and I made a branch for this
> at some point already. I'm not particularly interested in option 3
> myself, as I haven't had the use-case.

I'm skeptical that there would be enough benefit for protocol 2 without
implementing a registry to take advantage of integer pickle codes.

The other benefit of protocol 2 has to do with the way instance pickles are
constructed and, for persistent objects, ZODB takes a very different
approach anyway.

I suggest doing some realistic experiments to look at the impact of the
change.

- Convert an interesting Zope 2 database from protocol 1 to protocol 2.
  How does this affect database size?

- Do some sort of write and read benchmarks using the 2 protocols to
  see if there's a meaningful benefit.

For the above, this doesn't include a class registry, since I don't think
you're proposing that.

BTW, I have almost no interest in a custom non-pickle protocol.

Jim

-- 
Jim Fulton


More information about the ZODB-Dev mailing list