[ZODB-Dev] Changing the pickle protocol?

Jim Fulton jim at zope.com
Wed Apr 28 12:02:47 EDT 2010


On Wed, Apr 28, 2010 at 11:43 AM, Hanno Schlichting <hanno at hannosch.eu> wrote:
> On Wed, Apr 28, 2010 at 5:11 PM, Jim Fulton <jim at zope.com> wrote:
>> Do you know of specific benefits you expect from protocol 2? Any
>> specific reasons
>> you think it would be better in practice?
>
> I have just seen some ongoing work on pickles in recent times, for
> example from the Python 2.7 what's new:
>
> - The pickle and cPickle modules now automatically intern the strings
> used for attribute names, reducing memory usage of the objects
> resulting from unpickling. (Contributed by Jake McGuire; issue 5084.)

I can't see why this should be protocol specific.

> - The cPickle module now special-cases dictionaries, nearly halving
> the time required to pickle them. (Contributed by Collin Winter; issue
> 5670.)

That's odd. cPickle lready special-cased dictionaries.

> Unless I've misread the code, these changes only apply to protocol
> two.

We should double check that. I'll take a closer look.

> And then there's the old claims of pep 307 stating that pickling
> new-style classes would be more efficient.

Which doesn't apply to persistent object since they're handled differently.

> Finally Python 3 introduces pickle protocol version 3, which deals
> explicitly with the new bytes type. There's more changes in Python 3
> and the pickle format, so that's a separate project. But it suggested
> to me, that the pickle format isn't quite as "dead" anymore as it used
> to be.

I really think Python 3 is a separate topic.  It's likely that there
will be many things to confront. :(

>
>> I've avoided going to protocol 2 for two reasons:
>>
>> - It wasn't clear we'd get a benefit without deeper changes.
>>  Those deeper changed might be of value, but only if we're
>>  careful about how we make them.
>>
>>  In particular, we could replace class names in pickles
>>  if we has a registry mapping ints to class names.
>>  This could provide a number of benefits beyond
>>  smaller pickles, but it needs some thought to get right.
>
> Right. I'm not particular interested in the pickle class registry.
> Having a hard dependency between code filling the registry and the
> actual data has all sorts of implications. I don't really want to go
> there myself.

It has some positive implications if you get it right:

- Smaller pickles
- Easier class renaming
- Potentially greater security

Getting it right is almost certainly a bigger project than
anyone wants to deal with right now.

...

>> I suggest doing some realistic experiments to look at the impact of the
>> change.
>>
>> - Convert an interesting Zope 2 database from protocol 1 to protocol 2.
>>  How does this affect database size?
>>
>> - Do some sort of write and read benchmarks using the 2 protocols to
>>  see if there's a meaningful benefit.
>
> Ok, thanks. That gives me enough direction to work on some specific benchmarks.

Cool. BTW, you might want to (try :) to search the list archives. I
think someone did some
experiments a while back.

Jim

-- 
Jim Fulton


More information about the ZODB-Dev mailing list