[Zope3-dev] Space usage of unicode strings in the ZODB
Andreas Jung
Andreas Jung" <andreas@zope.com
Thu, 14 Feb 2002 12:13:52 -0500
Based on the discussion either to unicode strings
or UTF-X encoded strings in Zope 3 I made some tests
to get some ideas about the space usage of unicode
strings in the ZODB.
Test input was a Latin-1 document (2.6 MB, 374.000 words).
The list of words has been stored in a Standalone ZODB.
Results:
String encodings:
UTF-7 4.5 MB
UTF-8 4.4 MB
UTF-16 7.6 MB
UTF-32 unknown (Python does not seem to support this encoding
???)
Unicode strings:
internal UCS-2 encoding 5.4 MB
internal UCS-4 encoding 5.4 MB
I am astonished that unicode strings require the same space -
independant from their internal storage in Python (2 vs. 4 bytes).
Andreas