[Checkins] [zopefoundation/ZODB] c1e080: interface: Require invalidations to be called with...

Kirill Smelkov noreply at github.com
Mon Aug 31 12:42:04 CEST 2020


  Branch: refs/heads/master
  Home:   https://github.com/zopefoundation/ZODB
  Commit: c1e080520a466d88c2835b7489d01b2dc7e31c2e
      https://github.com/zopefoundation/ZODB/commit/c1e080520a466d88c2835b7489d01b2dc7e31c2e
  Author: Kirill Smelkov <kirr at nexedi.com>
  Date:   2020-08-31 (Mon, 31 Aug 2020)

  Changed paths:
    M src/ZODB/interfaces.py

  Log Message:
  -----------
  interface: Require invalidations to be called with full set of objects and not to skip transactions

Currently invalidate documentation is not clear whether it should be
called for every transaction and whether it should include full set of
objects created/modified by that transaction. Until now this was working
relatively well for the sole purpose of invalidating client ZEO cache,
because for that particular task it is relatively OK not to include just
created objects into invalidation messages, and even to completely skip
sending invalidation if transaction only create - not modify - objects.
Due to this fact the workings of the client cache was indifferent to the
ambiguity of the interface.

In 2016 skipping transactions with only created objects was reconsidered
as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
semantic and needs to be notified about every transaction committed to
storage to be able to properly update ZODB.Connection view:

https://github.com/zopefoundation/ZEO/commit/02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
https://github.com/zopefoundation/ZEO/commit/9613f09b

However just-created objects were not included into invalidation
messages until, hopefully, recently:

https://github.com/zopefoundation/ZEO/pull/160

As ZODB is started to be used more widely in areas where it was not
traditionally used before, the ambiguity in invalidate interface and the
lack of guarantees - for any storage - to be notified with full set of
information, creates at least the following problems:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  *includes* created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

Also:

- the property that all objects - both modified and just created -
  are included into invalidation messages is required and can help to
  remove `next_serial` from `loadBefore` return in the future.
  This, in turn, can help to do 2x less SQL queries in loadBefore for
  NEO and RelStorage (and maybe other storages too):
  https://github.com/zopefoundation/ZODB/issues/318#issuecomment-657685745

Current state of storages with respect to new requirements:

- ZEO: does not skip transactions, but includes only modified - not
  created - objects. This is fixed by https://github.com/zopefoundation/ZEO/pull/160

- NEO: already implements the requirements in full

- RelStorage: already implements the requirements in full, if I
  understand correctly:

  https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145

While editing invalidate documentation, use the occasion to document
recently added property that invalidate(tid) is always called before
storage starts to report its lastTransaction() ≥ tid - see 4a6b0283
(mvccadapter: check if the last TID changed without invalidation).

/cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083
/reviewed-on https://github.com/zopefoundation/ZODB/pull/319
/reviewed-by @dataflake
/reviewed-by @jmuchemb

[1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71




More information about the checkins mailing list