[Zope3-dev] Space usage of unicode strings in the ZODB

Andreas Jung Andreas Jung" <andreas@zope.com
Thu, 14 Feb 2002 14:02:35 -0500


----- Original Message -----
From: "Tim Peters" <tim@zope.com>
To: "Andreas Jung" <andreas@zope.com>
Cc: <zope3-dev@zope.org>
Sent: Thursday, February 14, 2002 13:58
Subject: RE: [Zope3-dev] Space usage of unicode strings in the ZODB


>
> I don't understand what you're doing well enough to say for sure, but
> wouldn't any such test just be measuring how cPickle encodes strings?
It's
> not entirely clear, but I assume you're measuring final database size, and
> not (e.g.) process memory size, and I see that binary pickles always
convert
> Unicode strings to UTF-8.  So Python's internal representation should be
> irrelevant.  Unlike storing UTF-8 strings directly, though, BINUNICODE
> appears always to use 5 bytes to store the string's length, so is less
> disk-efficient for short strings than pickle's binary SHORT_BINSTRING
> encoding.

Yes, I measuread the final size of the ZODB. Using UTF-8 explains why the
internal encoding does not matter for the final size of the ZODB.

Thanks,
Andreas