[Zope3-dev] Space usage of unicode strings in the ZODB

Andreas Jung Andreas Jung" <andreas@zope.com
Thu, 14 Feb 2002 12:13:52 -0500


Based on the discussion either to unicode strings
or UTF-X encoded strings in Zope 3 I made some tests
to get some ideas about the space usage of unicode
strings in the ZODB.

Test input was a Latin-1 document (2.6 MB, 374.000 words).
The list of words has been stored in a Standalone ZODB.

Results:

String encodings:
UTF-7              4.5 MB
UTF-8              4.4 MB
UTF-16             7.6 MB
UTF-32             unknown (Python does not seem to support this encoding
???)

Unicode strings:
internal UCS-2 encoding         5.4 MB
internal UCS-4 encoding         5.4 MB


I am astonished that unicode strings require the same space -
independant from their internal storage in Python (2 vs. 4 bytes).

Andreas