[Zope3-dev] mini-RFC: times and timezones

Tim Peters tim.peters at gmail.com
Thu Feb 24 12:32:51 EST 2005


[Tim Peters]
...
>> A consideration I didn't see mentioned is that lots of work went into
>> ensuring that datetime objects have efficient pickle representations
>> (small size, and very fast to pickle and unpickle).  Of course this
>> works best for naive datetimes -- pickling an aware datetime drags in
>> all the machinery for pickling its instance of a tzinfo subclass too.

[Gary Poster]
> I am under the impression that the datetimeutils tzinfo implementations
> are pickle-friendly as well.  That's what I was intending to use as the
> basis of zope.i18n.utc.

View me as a visiting Python developer here, not as a Zope coworker
<wink>.  That is, I wrote nearly all the C code in Python's
datetimemodule.c, but don't know squat about Zope's datetimeutils.

So let's try it:

>>> from zope.app import datetimeutils
>>> import cPickle as pickle
>>> from datetime import datetime
>>> n = datetime.now()
>>> p = pickle.dumps(n, 1)
>>> len(p)   # size of naive pickle
39
>>> tz = datetimeutils.tzinfo(-500)
>>> n2 = n.replace(tzinfo=tz)
>>> p2 = pickle.dumps(n2, 1)
>>> len(p2)   # size of aware pickle
82

So it costs a bit more than twice as many bytes to pickle an aware
datetime using datetimeutils.tzinfo than it costs to pickle a naive
datetime.

Looking at the pickles shows that a lot of this is due to long strings
just naming the globals involved:

>>> p
'cdatetime\ndatetime\nq\x01(U\n\x07\xd5\x02\x18\x0b5\x0e\r\xd3\x10tRq\x02.'
>>> p2
'cdatetime\ndatetime\nq\x01(U\n\x07\xd5\x02\x18\x0b5\x0e\r\xd3\x10czope.app.datetimeutils\ntzinfo\nq\x02(J\x0c\xfe\xff\x
fftRq\x03tRq\x04.'

The _essential_ differences are clearer from a disassembly:

>>> from pickletools import dis
>>> dis(p)
    0: c    GLOBAL     'datetime datetime'
   19: q    BINPUT     1
   21: (    MARK
   22: U        SHORT_BINSTRING '\x07\xd5\x02\x18\x0b5\x0e\r\xd3\x10'
   34: t        TUPLE      (MARK at 21)
   35: R    REDUCE
   36: q    BINPUT     2
   38: .    STOP
highest protocol among opcodes = 1
>>> dis(p2)
    0: c    GLOBAL     'datetime datetime'
   19: q    BINPUT     1
   21: (    MARK
   22: U        SHORT_BINSTRING '\x07\xd5\x02\x18\x0b5\x0e\r\xd3\x10'
   34: c        GLOBAL     'zope.app.datetimeutils tzinfo'
   65: q        BINPUT     2
   67: (        MARK
   68: J            BININT     -500
   73: t            TUPLE      (MARK at 67)
   74: R        REDUCE
   75: q        BINPUT     3
   77: t        TUPLE      (MARK at 21)
   78: R    REDUCE
   79: q    BINPUT     4
   81: .    STOP
highest protocol among opcodes = 1

Having to stick the relatively giant  'zope.app.datetimeutils tzinfo'
string in each aware pickle sucks; so does having to stick the
relatively giant 'datetime datetime' string in each pickle whether
naive or aware.  I say "relatively giant" because the only
honest-to-gosh data in a naive datetime is compactly represented by
the 10-byte SHORT_BINSTRING (which is a direct memory dump of the 10
bytes of C data in a naive datetime -- internally, it's an array of 10
unsigned char).

Moving to pickle protocol 2 could slash the space waste due to
repeating these large strings, if we exploited protocol 2's new
"global object" registry. An aware datetime would still need extra
pickle opcodes to fetch the datetimeutils tzinfo global object and set
up its argument list.

BTW, I don't claim that pickle size should drive anything here -- I
just wanted people to be aware that it is one of the tradeoffs
involved.


More information about the Zope3-dev mailing list