[ZODB-Dev] Changing the pickle protocol?

Laurence Rowe l at lrowe.co.uk
Wed Apr 28 21:18:21 EDT 2010


I suspect that something like 90% of ZODB pickle data will be string
values, so the scope for reducing the space used by a ZODB through the
newer pickle protocol – and even the class registry – is limited.

What would make a significant impact on data size is compression. With
lots of short strings it's probably best to use a preset dictionary
(which sadly does not seem to be exposed through the python zlib
module). Text is usually very amenable to compression, and now we have
blobs most binary data will no longer be in the Data.fs.

Compression could either be implemented on the database level (which
is probably cleanest) or on the application level (which would also
reduce the size of content objects in memory). This would bring clear
wins where I/O or memory bandwidth are the limiting factors - CPUs
spend most of their time waiting for data to be copied into their
cache from memory.

Laurence

2010/4/28 Hanno Schlichting <hanno at hannosch.eu>:
> Hi.
>
> The ZODB currently uses a hardcoded pickle protocol one. There's both
> the more efficient protocol two and in Python 3 protocol 3. Protocol
> two has seen various improvements in recent Python versions, triggered
> by its use in memcached.
>
> I'd be interested to work on changing the protocol. How should I approach this?
>
> I can see three general approaches:
>
> 1. Hardcode the version to 2 in all places, instead of one.
>
> Pros: Easy to do, backwards compatible with all supported Python versions
> Cons: Still inflexible
>
> 2. Make the protocol version configurable
>
> Pros: Give control to the user, one could change the protocol used for
> storages or persistent caches independently
> Cons: More overhead, different protocol versions could have different bugs
>
> 3. Make the format configurable
>
> Shane made a proposal in this direction at some point. This would
> abstract the persistent format and allow for different serialization
> formats. As part of this one could also have different Pickle/Protocol
> combinations.
>
> Pros: Lots of flexibility, it might be possible to access the data
> from different languages
> Cons: Even more overhead
>
>
> If I am to look into any of these options, which one should I look
> into? Option 1 is obviously the easiest and I made a branch for this
> at some point already. I'm not particularly interested in option 3
> myself, as I haven't had the use-case.
>
> Thanks for any advice,
> Hanno
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev at zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev
>


More information about the ZODB-Dev mailing list