[Checkins] SVN: zc.relationship/trunk/src/zc/relationship/ Optimization examples, discussion, and code comments.

Wed May 17 11:19:12 EDT 2006

Log message for revision 68170:
  Optimization examples, discussion, and code comments.
  

Changed:
  U   zc.relationship/trunk/src/zc/relationship/README.txt
  U   zc.relationship/trunk/src/zc/relationship/index.py

-=-
Modified: zc.relationship/trunk/src/zc/relationship/README.txt
===================================================================

--- zc.relationship/trunk/src/zc/relationship/README.txt	2006-05-17 14:33:35 UTC (rev 68169)
+++ zc.relationship/trunk/src/zc/relationship/README.txt	2006-05-17 15:19:12 UTC (rev 68170)
@@ -1258,3 +1258,79 @@
     ...       'objects': roles['Project Manager'],
     ...       'context': projects['zope.org redesign']}))]
     []
+
+Optimizing relationship index use
+=================================
+
+There are three optimization opportunities built into the index.
+
+- use the cache to load and dump tokens;
+
+- don't load or dump tokens (the values themselves may be used as tokens); and
+
+- have the returned value be of the same btree family as the result family.
+
+For some operations, particularly with hundreds or thousands of members in a
+single relationship value, some of these optimzations can speed up some
+common-case reindexing work by around 100 times.
+
+The easiest (and perhaps least useful) optimization is that all dump
+calls and all load calls generated by a single operation share a cache
+dictionary per call type (dump/load), per indexed relationship value. 
+Therefore, for instance, we could stash an intids utility, so that we
+only had to do a utility lookup once, and thereafter it was only a
+single dictionary lookup. This is what the default `generateToken` and
+`resolveToken` functions in index.py do: look at them for an example.
+
+A further optimization is to not load or dump tokens at all, but use values
+that may be tokens.  This will be particularly useful if the tokens have
+__cmp__ (or equivalent) in C, such as built-in types like ints.  To specify
+this behavior, you create an index with the 'load' and 'dump' values for the
+indexed attribute descriptions explicitly set to None.
+
+    >>> ix = index.Index(
+    ...     ({'element': IRelationship['subjects'], 'multiple': True,
+    ...       'dump': None, 'load': None},
+    ...      {'element': IRelationship['relationshiptype'], 
+    ...       'dump': relTypeDump, 'load': relTypeLoad, 'btree': OIBTree,
+    ...       'name': 'reltype'},
+    ...      {'element': IRelationship['objects'], 'multiple': True,
+    ...       'dump': None, 'load': None},
+    ...      {'element': IContextAwareRelationship['getContext'], 
+    ...       'name': 'context'}),
+    ...     index.TransposingTransitiveQueriesFactory('subjects', 'objects'))
+    ...
+    >>> sm['rel_index_2'] = ix
+    >>> app['ex_rel_1'] = rel = Relationship((1,), 'has the role of', (2,))
+    >>> ix.index(rel)
+    >>> list(ix.findValueTokens('objects', {'subjects': 1}))
+    [2]
+
+Finally, if you have single relationships that relate hundreds or thousands
+of objects, it can be a huge win if the value is a 'multiple' of the same type
+as the stored BTree for the given attribute.  The default BTree family for
+attributes is IFBTree; IOBTree is also a good choice, and may be preferrable
+for some applications.
+
+    >>> ix = index.Index(
+    ...     ({'element': IRelationship['subjects'], 'multiple': True,
+    ...       'dump': None, 'load': None},
+    ...      {'element': IRelationship['relationshiptype'], 
+    ...       'dump': relTypeDump, 'load': relTypeLoad, 'btree': OIBTree,
+    ...       'name': 'reltype'},
+    ...      {'element': IRelationship['objects'], 'multiple': True,
+    ...       'dump': None, 'load': None},
+    ...      {'element': IContextAwareRelationship['getContext'], 
+    ...       'name': 'context'}),
+    ...     index.TransposingTransitiveQueriesFactory('subjects', 'objects'))
+    ...
+    >>> sm['rel_index_3'] = ix
+    >>> from BTrees import IFBTree
+    >>> app['ex_rel_2'] = rel = Relationship(
+    ...     IFBTree.IFTreeSet((1,)), 'has the role of', IFBTree.IFTreeSet((2,)))
+    >>> ix.index(rel)
+    >>> list(ix.findValueTokens('objects', {'subjects': 1}))
+    [2]
+
+Remember that BTrees (not just BTreeSets) can be used for these values: the keys
+are used as the values in that case.
\ No newline at end of file

Modified: zc.relationship/trunk/src/zc/relationship/index.py
===================================================================
--- zc.relationship/trunk/src/zc/relationship/index.py	2006-05-17 14:33:35 UTC (rev 68169)
+++ zc.relationship/trunk/src/zc/relationship/index.py	2006-05-17 15:19:12 UTC (rev 68170)
@@ -134,6 +134,10 @@
             if (res['dump'] is None) ^ (res['load'] is None):
                 raise ValueError(
                     "either both of 'dump' and 'load' must be None, or neither")
+                # when both load and dump are None, this is a small
+                # optimization that can be a large optimization if the returned
+                # value is one of the main four options of the selected btree
+                # family (BTree, TreeSet, Set, Bucket).
             res['interface'] = val.interface
             res['multiple'] = data.get('multiple', False)
             res['call'] = zope.interface.interfaces.IMethod.providedBy(val)
@@ -162,6 +166,7 @@
             return values, values, False
         elif data['dump'] is None and isinstance(values, (
             data['TreeSet'], data['BTree'], data['Bucket'], data['Set'])):
+            # this is the optimization story (see _add)
             return values, values, True
         else:
             cache = {}
@@ -232,7 +237,9 @@
                             # hand, if there are a lot of removals, it's
                             # probably quicker just to make a new one."  We
                             # pick >25 removals, mostly arbitrarily, as our 
-                            # "cut bait" cue.
+                            # "cut bait" cue.  We don't just do a len of
+                            # removed because lens of btrees are potentially
+                            # expensive.
                             for ix, t in enumerate(removed):
                                 if ix >= 25: # arbitrary cut-off
                                     newTokens = data['TreeSet'](newTokens)