[Zope-dev] Bulletproof ZCatalog proposal

Shane Hathaway shane@digicool.com
Thu, 7 Jun 2001 17:13:06 -0400 (EDT)


On Thu, 7 Jun 2001, Phillip J. Eby wrote:

> >I was thinking that certain types of objects would be committed by the
> >transaction manager before all others.  In this case, the catalog (or a
> >special object in the catalog) would be committed first.  It would
> >resolve all conflicts in the contained indices before they occur by
> >replaying the changes in the persisted queues from the transaction
> >history, then setting the _p_serial attributes to convince the storage
> >that the conflicts have already been resolved.
>
> Hm.  Sounds to me like what you actually want is for the transaction
> manager to do this *after* everything else, rather than before.  Thus, you
> would catch any changes which occur *during* transaction commit - such as
> commit-phase cataloging (as some folks do with ZPatterns currently).

Maybe I didn't explain this clearly enough.  Let me write some quick
pseudocode:


class Catalog (Persistent):
  finished_changes = None   # Mapping: {path -> (object, adding)}
  unfinished_changes = None # Same as above
  tid = None                # Transaction ID

  def catalogObject(self, ob, path):
    unf = self.unfinished_changes
    if unf is None: self.unfinished_changes = unf = {}
    unf[path] = (ob, 1)

  def uncatalogObject(self, path):
    unf = self.unfinished_changes
    if unf is None: self.unfinished_changes = unf = {}
    unf[path] = (ob, 0)

  def searchResults(self, ...):
    self.finishChanges()
    # ... Perform search ...
    return results

  def finishChanges(self):
    unf = self.unfinished_changes
    if unf is not None:
      fin = self.finished_changes
      if fin is None or self._p_serial != self.tid:
        # Create finished_changes if not yet created
        # and clear it if we're in a different transaction
        # from the last time finished_changes was changed.
        self.finished_changes = fin = {}
        self.tid = self._p_serial
      for path, (ob, adding) in unf.items():
        if adding: self.addToIndexes(ob, path)
        else: self.removeFromIndexes(path)
        fin[path] = (ob, adding)
      self.unfinished_changes = None

  def __getstate__(self):
    # Called during transaction commit.
    self.finishChanges()
    return Persistent.__getstate__(self)

  def _p_priority(self):
    # Causes this object to be added to the *top*
    # of the list of objects to commit rather than the
    # bottom.  (Just an idea.)
    return 1

  def _p_resolveConflict(self, old, committed, newstate):
    '''
    Apply the changes in self.finished_changes to
    committed and return the result.
    '''


This does mean that _p_resolveConflict() might be called frequently, but
(potentially) it would never fail because of conflicts.

Now, this doesn't provide any automatic cataloging, which is what I think
you're suggesting.  I think automatic reindexing, a good idea, is mostly
independent of bulletproofing and lazifying the catalog.

To achieve automatic indexing, I was thinking that a special attribute
would be added to cataloged objects.  It would contain the OIDs of the
catalogs interested in the object.  Transaction.register() would look for
this attribute and invoke catalogObject().  Of course, that wouldn't quite
work because the object might change again within the transaction and the
transaction manager wouldn't be told about the second and further changes.
But I'm sure there's a good way to compensate for this, such as making the
catalog scan for later changes before calling searchResults().
(cPersistence might need to assist.)

Shane