[Zope-CMF] reindexing optimizations

Julien Anguenot ja at nuxeo.com
Sat Nov 19 11:36:53 EST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

We tackled the problem within CPS couple of months ago by taking
advantages of the ZODB before commit hooks. The idea is to define an
Indexation Manager registred as a before commit hook that will filter
and store all the indexation calls on CMF objects and then wait for the
end of the transaction (actually just *before* the end of the
transaction) to do the actual indexation. Like that, we got atomic
indexation whatever is happening during the transaction, for a given
object. The actual reindexObject() and reindexObjectSecutiry() calls are
actually redirected to the indexation manager that is queuing the call
with the parameters.

You can check the code there :
http://svn.nuxeo.org/trac/pub/file/CPSCore/trunk/IndexationManager.py

We needed to extend the ZODB API to deal with subscriber orders. Note,
an endless discussion occurred in the ZODB list about this... Anyway,
you'll find this there :
https://svn.nuxeo.org/trac/pub/file/CPSCore/trunk/TransactionManager.py

We are using the same idea for the tree cache updates (because we don't
 store navigation trees within the catalog <wink>
http://svn.nuxeo.org/trac/pub/file/CPSCore/trunk/TreeCacheManager.py

As well for the events notifications :
http://svn.nuxeo.org/trac/pub/file/CPSSubscriptions/trunk/EventManager.py

Feel free to ask questions on the cps-devel lists if you got any.

Enjoy !

	J.

Alec Mitchell wrote:
> So, Sidnei has been plugging away at the "AT reindexes things an obscene 
> number of times" issue today, and appears to have fixed many of the AT 
> triggered indexing redundancies.  There are however still a few places in 
> CMF where some cataloging redundancy might be avoided.  One obvious place is 
> during object creation, where the following happens:
> 
> *) TypesTool.constructInstance() is triggered
>     **) A _setObject call results in CMFCatalogAware.manage_afterAdd() which 
> triggers a full indexObject().
>     *) This is shortly followed by TypesTool._finishConstruction()
>         *) Which calls CMFCatalogAware.notifyWorkflowCreated()
>             *) Which in turn calls WorkFlowTool._reindexWorkflowVariables()
>                 **) Which does a CMFCatalogAware.reindexObject([idxs]) on 
> workflow specific variables (with a full metadata update)
>                 *) And calls CMFCatalogAware.reindexObjectSecurity() which 
> reindexes the object only on the security index, and doesn't touch metadata.
>         **) TypesTool._finishConstruction() then does another 
> CMFCatalogAware.reindexObject().
> 
> So we have two full reindexes, and three metadata updates.  The last reindex 
> appears to be there only to catch the change to 'portal_type' in 
> _finishConstruction.  So, this final reindexObject, might safely be changed 
> to reindexObject(['portal_type', 'Type']), though the possibility exists 
> that other indexed attributes added by 3rd parties may depend on the value 
> of portal_type (say, I use an autogenerated Title which includes the Type).  
> Additionally, almost immediately before this last reindexObject call, 
> another reindexObject call has happened in notifyWorkflowCreated, which 
> included a full catalog metadata update.  As a result, updating the catalog 
> metadata here is certainly redundant.  Unfortunately, the 
> CMFCatalogAware.reindexObject method provides no means of avoiding the 
> duplicate metadata update, though it would be trivial to add and to use 
> here.
> 
> Another option suggested by Sidnei on IRC, which would avoid the potential 
> issues with limiting the variables indexed in the final reindex.  Would be 
> to let CMFCatalogAware.manage_afterAdd know (presumably via some state 
> variable) that it is being invoked through constructInstance/invokeFactory, 
> in which case it could safely skip the initial indexing and allow 
> _finishConstruction to take care of indexing the object fully on it's own at 
> the end.  In the long term we will probably be better served by delaying all 
> indexing to transaction boundaries, though it will be a fair bit harder to 
> implement, and may irk some developers who depend on immediate changes to 
> the catalog on reindex.
> 
> Alec
> _______________________________________________
> Zope-CMF maillist  -  Zope-CMF at lists.zope.org
> http://mail.zope.org/mailman/listinfo/zope-cmf
> 
> See http://collector.zope.org/CMF for bug reports and feature requests


- --
Julien Anguenot | Nuxeo R&D (Paris, France)
CPS Platform : http://www.cps-project.org
Zope3 / ECM   : http://www.z3lab.org
mail: anguenot at nuxeo.com; tel: +33 (0) 6 72 57 57 66
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFDf1SkGhoG8MxZ/pIRAnK6AJ43MLANyCkhWRG4NmfJT3M7KhSzbQCdFiCP
QjNQFa4+XuhHPc1DND0OBWs=
=AXk4
-----END PGP SIGNATURE-----


More information about the Zope-CMF mailing list