[ZODB-Dev] ZODB 3.3 and pickle protocol 2?

Tim Peters tim at zope.com
Tue Dec 7 19:59:05 EST 2004


[Eric Lambart]
> Hello, I just subscribed to the list after browsing the archives from the
> past six months.
>
> I am developing an OODB management system for a client, and we have
> decided that ZODB and ZEO (and IndexedCatalog--hi Christian) are probably
> the best platform upon which to build our software.

Cool!  Let us know how it goes.

> We haven't yet created anything but test data storages using ZODB 3.2.x,
> and I get the impression that life will be simpler if we start with 3.3
> --providing of course that we and/or Christian/Johan/et al can get
> IndexedCatalog tweaked to work with 3.3.
>
> My understanding is that once IC is divorced from ExtensionClass, the
> IndexedObjects can/will be new-style Python classes.

Have to defer to Christian et alia.

> This leads to my questions that are relevant to this list.
>
> I've read PEP 307 (several times!) and am very intrigued by the apparent
> disk storage savings to be gained by using the new pickle protocol (2). I
> can see that protocol 1 (which I know previously simply meant bin=True)
> is hard-coded throughout ZODB.

Yes.  And if you find a place that doesn't hard-code 1 now, it's a bug.

> So here's what I'm wondering: a) Are there are any plans to move to
> protocol 2 for ZODB, or better yet, a non-hard-coded protocol value that
> can be chosen programatically?

There's an internal wish-list item for the former (moving to proto 2).  It
got delayed because a relevant Python bug in proto 2 popped up, which wasn't
fixed until Python 2.3.4.  "Delayed" is still where it's at.

Nobody asked about programmatic selection of pickle protocol before.  It may
be a good idea, or it may be just another mysterious knob people get in
trouble over.

It may be necessary anyway, if we're *ever* to exploit proto 2.  All Pythons
in use today speak protos 0 and 1, but only 2.3.4+ speak proto 2.  That
severely constrains what can be done wrt mixing Python versions across ZEO
clients and servers.  For example, if we hardcoded 2 on a ZEO server, that
would make it impossible for a ZEO client using an older Python to talk to
that server.  Similarly for using proto 2 on a client (it could only talk to
a "modern" server).

One caution:  because ZODB is probably the world's heaviest pickle user, and
ZODB isn't using proto 2, the possibility of more relevant proto 2 bugs
can't be dismissed.  IIRC, the proto 2 bug mentioned above was found almost
immediately after Jim Fulton simply *tried* using proto 2 in some Zope3
context.

> b) Is there any reason why using the newer pickle protocol would NOT make
> any significant difference (improvement!) in data storage using ZODB?

Well, it depends so much on your data.  For example, if you primarily store
binary data (say, .jpegs) in giant strings, proto 2 won't save anything.  If
you store instances of new-style classes, proto 2 may offer significant
savings.  The only way to know is to try it with actual data.

Since you've read PEP 207, you know about the "extension codes" in proto 2.
That alone *could* save significant storage, by replacing endless
repetitions of popular long module+class strings (like
"BTrees.IIBTree.IIBTree") in pickles with short binary codes.  But no
extension codes have been assigned yet, neither for Zope use nor in core
Python -- people just ran out of time to pursue that, despite that it's a
simple way to get a pure win.

> c) Assuming there would be some advantage to these changes, if we were to
> make such changes to ZODB (which seem far easier at this point than
> making the changes to IC would be), are they likely to be accepted as
> patches to the ZODB codebase?

For legal and procedural reasons, Zope Corp generally cannot accept code
unless the submitter has signed a Zope Contributor Agreement:

    http://zope.org/DevHome/CVS/Contributor.pdf

If that's a problem for you, that's a problem for us.

Modulo that, I personally would love to improve ZODB/ZEO's pickling, and
would volunteer some amount of (nearly non-existent, alas) "spare time" to
help.  It's possible I could devote some work time to it too, but probably
not in the near future.  If the Contributor Agreement isn't a hang-up, I
could certainly make enough spare time to review patches (by taking it out
of the spare time I use now to review Python patches <wink>).



More information about the ZODB-Dev mailing list