[ZODB-Dev] RFC: Python2 - Py3k database compatibility
Tres Seaver
tseaver at palladion.com
Tue Apr 16 20:38:06 UTC 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
After getting a bit bogged down during the PyCon US 2013 sprints, I'd
like to restart the discussion by outlining the problem as I think I
understand it now.
Proposal for ZODB pickle compatibility
======================================
Issues
- ------
- - There exists no forward-compatible way to pickle bytes on Python2
(Py3k pickle module "guesses", decoding any Python2 ``str`` using
``latin1``).
- - Some data pickled as ``str`` on Python2 truly is binary (e.g.,
``Pdata`` objects for Zope2's ``OFS.Image.File`` and
``OFS.Image.Image`` types; crypto hases?)
- - Some Python2 applications may have the same attribute for a given
class stored both as ``str`` and as ``unicode`` (due e.g., to bugs in
the code, literal defaults, browser quirks, changes to code over
time).
Scenarios
- ---------
.. _py2_forever:
Existing Python2-only Application
+++++++++++++++++++++++++++++++++
- - Code for the app is never(ish) going to migrate to Py3k.
- - Using an updated / supported ZODb package **must** be possible
- - Ideally, requires no changes to application code.
- - Ideally, requies no database fixup / conversion.
- - Best strategy is likely ignore_compat_.
.. _py3k_only:
New, Py3k-only Application
++++++++++++++++++++++++++
- - Code for the app will run only on Py3k.
- - Running with the latest-and-greatest ZODB **must** be possible.
- - Ideally, the code for the app will make no concessions to backward-
compatibility.
- - Best strategy is likely ignore_compat_.
.. _migrate_w_convert:
Python2 Application Migrating to Py3k
+++++++++++++++++++++++++++++++++++++
- - Application code "straddles" both Pythons using "compatible subset"
dialect, but only during the migration period.
- - During that period, code **must** be able to open the database from
both Python2 and Py3k.
- - Ideally, application code will need to make no concessions to
backward-compatibility after migration.
- - It is acceptable to run a conversion process which normalizes all
active records in the database prior to testing.
- - For databases which are already "binary clean" (binary data exists
only in blobs; the application creates no new non-blob binary
attributes), the best strategy is likely ignore_compat_.
- - For databases which are not already "binary clean" (there may be
non-blob binary attributes), the best strategy is likely to
convert_storages_, followed by replace_py2_cpickle_ (if the Python2
client might create new non-blob binary attributes).
- - wrap_storages_ (on the Python2 side) might be simpler than
replace_py2_cpickle_, if the sources of non-blob binary attributes are
well understood.
.. _straddle_w_convert:
Python2 Application Straddling Python2 / Py3k (1)
+++++++++++++++++++++++++++++++++++++++++++++++++
- - Application code "straddles" both Pythons using "compatible subset"
dialect.
- - Code **must** be able to open the database from both Python2 and Py3k.
- - It is acceptable to run a conversion process which normalizes all
active records in the database prior to testing.
- - For databases which are already "binary clean" (binary data exists
only in blobs; the application creates no new non-blob binary
attributes), the best strategy is likely ignore_compat_.
- - For databases which are not already "binary clean" (there may be
non-blob binary attributes), the best strategy is likely to
convert_storages_, followed by replace_py2_cpickle_ (if the Python2
client might create new non-blob binary attributes).
- - For cases where Python2 and Py3k clients may share the database for an
extended period, and where disruption to the Python2 clients must be
minimized, the replace_py3k_pickle_ strategy might be preferred, until
convert_storages_ becomes feasible.
.. _straddle_no_convert:
Python2 Application Migrating to Py3k (2)
+++++++++++++++++++++++++++++++++++++++++
- - Application code "straddles" both Pythons using "compatible subset"
dialect.
- - Code **must** be able to open the database from both Python2 and Py3k.
- - It is **not** acceptable to run a conversion process which normalizes
all active records in the database prior to testing (e.g., the
database is too large to convert on existing hardware, or the downtime
required for conversion is unacceptable).
- - Because disruption to the Python2 clients must be minimized, the best
strategy is likely replace_py3k_pickle_ until convert_storages_
becomes feasible.
- - Alternatively, wrap_storages_ might be the best strategy for the Py3k
clients.
Strategies
- ----------
.. _ignore_compat:
Ignore compatibility
++++++++++++++++++++
Use the stdlib pickle support in its default mode.
- - No changes to the ``ZODB`` packages on Python2 or Py3k.
- - Pickles created under Python2 will be readable on Py3k; however,
*all* bytes data will be coerced (via ``latin1``) to unicode.
- - Pickles created under Py3k will likely not be readable on Python2
(Python2 has no support for ``protocol 3``).
- - Easiest usage for applications which are never going to straddle.
- - Compatibility will only be achievalble via one-time conversions (where
the conversion script uses one of the other strategies or tools).
.. _replace_py3k_pickle:
Replace Py3k ``pickle``
+++++++++++++++++++++++
Keep pickling in the Python2 / protocol 1 way we have always done.
- - No changes to the ``ZODB`` packages on Python2. Storages do not need
to be configured with any custom pickle support.
- - On Py3k, ``ZODB`` uses pickler / unpickler from the ``zodbpickle``
module, such that Python2 ``str`` objects are unpickled as ``bytes``;
``bytes`` are pickled using the ``protocol 1`` opcodes (so that
Python2 will unpickle them as ``str``).
.. _replace_py2_cPickle:
Replace Python2 ``cPickle``
+++++++++++++++++++++++++++
Move to pickling in the new protocol 3 way (native under Py3k).
- - On Python2, applications which need to ensure that ``bytes`` objects
unpickle correctly under Py3k need must be changed to use a new type,
``zodbpickle,binary``. ``ZODB`` is configured with pickler / upickler
from ``zodbpickle``, such that objects of this type will be pickled
using the ``protocol 3`` opcodes for bytes (so that Py3k will unpickle
them as ``bytes``).
- - Existing data for the affected classes will need to be fixed up using
a variation of convert_storages_.
- - No changes to the ``ZODB`` packages on Py3k. Storages do not need to
be configured with any custom pickle support.
.. _convert_storages:
Convert Database Storages
+++++++++++++++++++++++++
- - Need tool(s) to identify problematic data:
- Classes which mix ``str`` and ``unicode`` values for the same
attribute across records / instances.
- - Utility which can apply per-class transforms to state pickles:
- E.g., for instances of ``OFS.Image.Pdata``, convert the ``data``
attribute (which should be a Python2 ``str``) to
``zodbpickle.binary``. (Of course, these would probably be better
off written out as blobs).
- Or, for some application which mixes ``str`` and ``unicode`` under
Python2 (either across instances or across transaction): upconvert
any value of type ``str`` for the given attribute(s) to ``unicode``,
using a configured encoding strategy (e.g, try ``utf8`` first,
falling back to ``latin1``).
- - One-time converter utility would use ``copyTransactionsFrom``-style
pattern, opening the existing database readonly, getting pickles for
each transaction, invoking the converter utility for each instance to
fix up the pickle, then writing the converted pickles into the new
database.
.. _wrap_storages:
Wrap Database Storages
++++++++++++++++++++++
- - A wrapper storage uses the converter utility (identified above) during
the ``load`` operation, fixing up the object state it is handed to the
instance's ``__setstate__``.
- - During the ``save`` operation, the wrapper would fix up pickled
instance state (after calling ``__getstate__``).
- - Wrappers might be applied under Python2 (e.g., for apps where the
databse is already converted to ``protocol 3``) as an alternative to
replace_py2_cpickle_.
- - Wrappers might be applied under Py3k (e.g., for apps where the databse
is not already converted to ``protocol 3``) as an alternative to
replace_py3k_pickle_..
Concrete Proposal
- -----------------
I believe we will need to update ``zodbpickle`` and ``ZDOB`` to allow
for any of the strategies to be applied.
- - ``zodbpickle`` should provide the script which analyzes pickles in
a database for inconsistent ``str`` / ``unicode`` usage. See:
https://github.com/jimfulton/dbstringanalysis
- - ``zodbpickle`` should provide the utility for registering per-class
fixups.
- - ``zodbpickle`` should provide the script which uses that utility
do to one-time conversion of a storage (supporting convert_storages_).
- - ``zodbpickle`` should provide a new ``binary`` type which Python2
applications can begin using to signal that attributes should be
unpickled in Py3k as ``bytes``. See:
https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes
- - ``zodbpickle`` should provide a pickler/unpickler for use by
Python2 clients who operate against converted storages
(replace_py2_cpickle_). See:
https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes
- - ``zodbpickle`` should provide a pickler/unpickler for use by
Py3k clients who operate against unconverted storages
(replace_py3k_pickle_). See:
https://github.com/zopefoundation/zodbpickle
- - ``zodbpickle`` might need to provide a wrapper storage supporting
straddle_no_convert_.
Comments?
Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/
iEYEARECAAYFAlFttq4ACgkQ+gerLs4ltQ5fswCeLcPj7QROXzlXazJIuK/nAAf6
YzkAnj07aERlQhZInv+lFWvQjqJnciZ8
=PLZq
-----END PGP SIGNATURE-----
More information about the ZODB-Dev
mailing list