[ZODB-Dev] RFC: Python2 - Py3k database compatibility

Tres Seaver tseaver at palladion.com
Tue Apr 16 20:38:06 UTC 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

After getting a bit bogged down during the PyCon US 2013 sprints, I'd
like to restart the discussion by outlining the problem as I think I
understand it now.

Proposal for ZODB pickle compatibility
======================================

Issues
- ------

- - There exists no forward-compatible way to pickle bytes on Python2
  (Py3k pickle module "guesses", decoding any Python2 ``str`` using
  ``latin1``).

- - Some data pickled as ``str`` on Python2 truly is binary (e.g.,
  ``Pdata`` objects for Zope2's ``OFS.Image.File`` and
  ``OFS.Image.Image`` types;  crypto hases?)

- - Some Python2 applications may have the same attribute for a given
  class stored both as ``str`` and as ``unicode`` (due e.g., to bugs in
  the code, literal defaults, browser quirks, changes to code over
  time).


Scenarios
- ---------


.. _py2_forever:

Existing Python2-only Application
+++++++++++++++++++++++++++++++++

- - Code for the app is never(ish) going to migrate to Py3k.

- - Using an updated / supported ZODb package **must** be possible

- - Ideally, requires no changes to application code.

- - Ideally, requies no database fixup / conversion.

- - Best strategy is likely ignore_compat_.


.. _py3k_only:

New, Py3k-only Application
++++++++++++++++++++++++++

- - Code for the app will run only on Py3k.

- - Running with the latest-and-greatest ZODB **must** be possible.

- - Ideally, the code for the app will make no concessions to backward-
  compatibility.

- - Best strategy is likely ignore_compat_.


.. _migrate_w_convert:

Python2 Application Migrating to Py3k
+++++++++++++++++++++++++++++++++++++

- - Application code "straddles" both Pythons using "compatible subset"
  dialect, but only during the migration period.

- - During that period, code **must** be able to open the database from
  both Python2 and Py3k.

- - Ideally, application code will need to make no concessions to
  backward-compatibility after migration.

- - It is acceptable to run a conversion process which normalizes all
  active records in the database prior to testing.

- - For databases which are already "binary clean" (binary data exists
  only in blobs; the application creates no new non-blob binary
  attributes), the best strategy is likely ignore_compat_.

- - For databases which are not already "binary clean" (there may be
  non-blob binary attributes), the best strategy is likely to
  convert_storages_, followed by replace_py2_cpickle_ (if the Python2
  client might create new non-blob binary attributes).

- - wrap_storages_ (on the Python2 side) might be simpler than
  replace_py2_cpickle_, if the sources of non-blob binary attributes are
  well understood.


.. _straddle_w_convert:

Python2 Application Straddling Python2 / Py3k (1)
+++++++++++++++++++++++++++++++++++++++++++++++++

- - Application code "straddles" both Pythons using "compatible subset"
  dialect.

- - Code **must** be able to open the database from both Python2 and Py3k.

- - It is acceptable to run a conversion process which normalizes all
  active records in the database prior to testing.

- - For databases which are already "binary clean" (binary data exists
  only in blobs; the application creates no new non-blob binary
  attributes), the best strategy is likely ignore_compat_.

- - For databases which are not already "binary clean" (there may be
  non-blob binary attributes), the best strategy is likely to
  convert_storages_, followed by replace_py2_cpickle_ (if the Python2
  client might create new non-blob binary attributes).

- - For cases where Python2 and Py3k clients may share the database for an
  extended period, and where disruption to the Python2 clients must be
  minimized, the replace_py3k_pickle_ strategy might be preferred, until
  convert_storages_ becomes feasible.


.. _straddle_no_convert:

Python2 Application Migrating to Py3k (2)
+++++++++++++++++++++++++++++++++++++++++

- - Application code "straddles" both Pythons using "compatible subset"
  dialect.

- - Code **must** be able to open the database from both Python2 and Py3k.

- - It is **not** acceptable to run a conversion process which normalizes
  all active records in the database prior to testing (e.g., the
  database is too large to convert on existing hardware, or the downtime
  required for conversion is unacceptable).

- - Because disruption to the Python2 clients must be minimized, the best
  strategy is likely replace_py3k_pickle_ until convert_storages_
  becomes feasible.

- - Alternatively, wrap_storages_ might be the best strategy for the Py3k
  clients.


Strategies
- ----------


.. _ignore_compat:

Ignore compatibility
++++++++++++++++++++

Use the stdlib pickle support in its default mode.

- - No changes to the ``ZODB`` packages on Python2 or Py3k.

- - Pickles created under Python2 will be readable on Py3k;  however,
  *all* bytes data will be coerced (via ``latin1``) to unicode.

- - Pickles created under Py3k will likely not be readable on Python2
  (Python2 has no support for ``protocol 3``).

- - Easiest usage for applications which are never going to straddle.

- - Compatibility will only be achievalble via one-time conversions (where
  the conversion script uses one of the other strategies or tools).


.. _replace_py3k_pickle:

Replace Py3k ``pickle``
+++++++++++++++++++++++

Keep pickling in the Python2 / protocol 1 way we have always done.

- - No changes to the ``ZODB`` packages on Python2.  Storages do not need
  to be configured with any custom pickle support.

- - On Py3k, ``ZODB`` uses pickler / unpickler from the ``zodbpickle``
  module, such that Python2 ``str`` objects are unpickled as ``bytes``;
  ``bytes`` are pickled using the ``protocol 1`` opcodes (so that
  Python2 will unpickle them as ``str``).


.. _replace_py2_cPickle:

Replace Python2 ``cPickle``
+++++++++++++++++++++++++++

Move to pickling in the new protocol 3 way (native under Py3k).

- - On Python2, applications which need to ensure that ``bytes`` objects
  unpickle correctly under Py3k need must be changed to use a new type,
  ``zodbpickle,binary``.  ``ZODB`` is configured with pickler / upickler
  from ``zodbpickle``, such that objects of this type will be pickled
  using the ``protocol 3`` opcodes for bytes (so that Py3k will unpickle
  them as ``bytes``).

- - Existing data for the affected classes will need to be fixed up using
  a variation of convert_storages_.

- - No changes to the ``ZODB`` packages on Py3k.  Storages do not need to
  be configured with any custom pickle support.


.. _convert_storages:

Convert Database Storages
+++++++++++++++++++++++++

- - Need tool(s) to identify problematic data:

  - Classes which mix ``str`` and ``unicode`` values for the same
    attribute across records / instances.

- - Utility which can apply per-class transforms to state pickles:

  - E.g., for instances of ``OFS.Image.Pdata``, convert the ``data``
    attribute (which should be a Python2 ``str``) to
    ``zodbpickle.binary``.  (Of course, these would probably be better
    off written out as blobs).

  - Or, for some application which mixes ``str`` and ``unicode`` under
    Python2 (either across instances or across transaction):  upconvert
    any value of type ``str`` for the given attribute(s) to ``unicode``,
    using a configured encoding strategy (e.g, try ``utf8`` first,
    falling back to ``latin1``).

- - One-time converter utility would use ``copyTransactionsFrom``-style
  pattern, opening the existing database readonly, getting pickles for
  each transaction, invoking the converter utility for each instance to
  fix up the pickle, then writing the converted pickles into the new
  database.


.. _wrap_storages:

Wrap Database Storages
++++++++++++++++++++++

- - A wrapper storage uses the converter utility (identified above) during
  the ``load`` operation, fixing up the object state it is handed to the
  instance's ``__setstate__``.

- - During the ``save`` operation, the wrapper would fix up pickled
  instance state (after calling ``__getstate__``).

- - Wrappers might be applied under Python2 (e.g., for apps where the
  databse is already converted to ``protocol 3``) as an alternative to
  replace_py2_cpickle_.

- - Wrappers might be applied under Py3k (e.g., for apps where the databse
  is not already converted to ``protocol 3``) as an alternative to
  replace_py3k_pickle_..


Concrete Proposal
- -----------------

I believe we will need to update ``zodbpickle`` and ``ZDOB`` to allow
for any of the strategies to be applied.

- - ``zodbpickle`` should provide the script which analyzes pickles in
  a database for inconsistent ``str`` / ``unicode`` usage.  See:
  https://github.com/jimfulton/dbstringanalysis

- - ``zodbpickle`` should provide the utility for registering per-class
  fixups.

- - ``zodbpickle`` should provide the script which uses that utility
  do to one-time conversion of a storage (supporting convert_storages_).

- - ``zodbpickle`` should provide a new ``binary`` type which Python2
  applications can begin using to signal that attributes should be
  unpickled in Py3k as ``bytes``.  See:
  https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes

- - ``zodbpickle`` should provide a pickler/unpickler for use by
  Python2 clients who operate against converted storages
  (replace_py2_cpickle_). See:
  https://github.com/zopefoundation/zodbpickle/tree/py2_explicit_bytes

- - ``zodbpickle`` should provide a pickler/unpickler for use by
  Py3k clients who operate against unconverted storages
  (replace_py3k_pickle_). See:
  https://github.com/zopefoundation/zodbpickle

- - ``zodbpickle`` might need to provide a wrapper storage supporting
  straddle_no_convert_.


Comments?


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlFttq4ACgkQ+gerLs4ltQ5fswCeLcPj7QROXzlXazJIuK/nAAf6
YzkAnj07aERlQhZInv+lFWvQjqJnciZ8
=PLZq
-----END PGP SIGNATURE-----



More information about the ZODB-Dev mailing list