[Checkins] SVN: zc.relationship/trunk/src/zc/relationship/
Optimization examples, discussion, and code comments.
Gary Poster
gary at zope.com
Wed May 17 11:19:12 EDT 2006
Log message for revision 68170:
Optimization examples, discussion, and code comments.
Changed:
U zc.relationship/trunk/src/zc/relationship/README.txt
U zc.relationship/trunk/src/zc/relationship/index.py
-=-
Modified: zc.relationship/trunk/src/zc/relationship/README.txt
===================================================================
--- zc.relationship/trunk/src/zc/relationship/README.txt 2006-05-17 14:33:35 UTC (rev 68169)
+++ zc.relationship/trunk/src/zc/relationship/README.txt 2006-05-17 15:19:12 UTC (rev 68170)
@@ -1258,3 +1258,79 @@
... 'objects': roles['Project Manager'],
... 'context': projects['zope.org redesign']}))]
[]
+
+Optimizing relationship index use
+=================================
+
+There are three optimization opportunities built into the index.
+
+- use the cache to load and dump tokens;
+
+- don't load or dump tokens (the values themselves may be used as tokens); and
+
+- have the returned value be of the same btree family as the result family.
+
+For some operations, particularly with hundreds or thousands of members in a
+single relationship value, some of these optimzations can speed up some
+common-case reindexing work by around 100 times.
+
+The easiest (and perhaps least useful) optimization is that all dump
+calls and all load calls generated by a single operation share a cache
+dictionary per call type (dump/load), per indexed relationship value.
+Therefore, for instance, we could stash an intids utility, so that we
+only had to do a utility lookup once, and thereafter it was only a
+single dictionary lookup. This is what the default `generateToken` and
+`resolveToken` functions in index.py do: look at them for an example.
+
+A further optimization is to not load or dump tokens at all, but use values
+that may be tokens. This will be particularly useful if the tokens have
+__cmp__ (or equivalent) in C, such as built-in types like ints. To specify
+this behavior, you create an index with the 'load' and 'dump' values for the
+indexed attribute descriptions explicitly set to None.
+
+ >>> ix = index.Index(
+ ... ({'element': IRelationship['subjects'], 'multiple': True,
+ ... 'dump': None, 'load': None},
+ ... {'element': IRelationship['relationshiptype'],
+ ... 'dump': relTypeDump, 'load': relTypeLoad, 'btree': OIBTree,
+ ... 'name': 'reltype'},
+ ... {'element': IRelationship['objects'], 'multiple': True,
+ ... 'dump': None, 'load': None},
+ ... {'element': IContextAwareRelationship['getContext'],
+ ... 'name': 'context'}),
+ ... index.TransposingTransitiveQueriesFactory('subjects', 'objects'))
+ ...
+ >>> sm['rel_index_2'] = ix
+ >>> app['ex_rel_1'] = rel = Relationship((1,), 'has the role of', (2,))
+ >>> ix.index(rel)
+ >>> list(ix.findValueTokens('objects', {'subjects': 1}))
+ [2]
+
+Finally, if you have single relationships that relate hundreds or thousands
+of objects, it can be a huge win if the value is a 'multiple' of the same type
+as the stored BTree for the given attribute. The default BTree family for
+attributes is IFBTree; IOBTree is also a good choice, and may be preferrable
+for some applications.
+
+ >>> ix = index.Index(
+ ... ({'element': IRelationship['subjects'], 'multiple': True,
+ ... 'dump': None, 'load': None},
+ ... {'element': IRelationship['relationshiptype'],
+ ... 'dump': relTypeDump, 'load': relTypeLoad, 'btree': OIBTree,
+ ... 'name': 'reltype'},
+ ... {'element': IRelationship['objects'], 'multiple': True,
+ ... 'dump': None, 'load': None},
+ ... {'element': IContextAwareRelationship['getContext'],
+ ... 'name': 'context'}),
+ ... index.TransposingTransitiveQueriesFactory('subjects', 'objects'))
+ ...
+ >>> sm['rel_index_3'] = ix
+ >>> from BTrees import IFBTree
+ >>> app['ex_rel_2'] = rel = Relationship(
+ ... IFBTree.IFTreeSet((1,)), 'has the role of', IFBTree.IFTreeSet((2,)))
+ >>> ix.index(rel)
+ >>> list(ix.findValueTokens('objects', {'subjects': 1}))
+ [2]
+
+Remember that BTrees (not just BTreeSets) can be used for these values: the keys
+are used as the values in that case.
\ No newline at end of file
Modified: zc.relationship/trunk/src/zc/relationship/index.py
===================================================================
--- zc.relationship/trunk/src/zc/relationship/index.py 2006-05-17 14:33:35 UTC (rev 68169)
+++ zc.relationship/trunk/src/zc/relationship/index.py 2006-05-17 15:19:12 UTC (rev 68170)
@@ -134,6 +134,10 @@
if (res['dump'] is None) ^ (res['load'] is None):
raise ValueError(
"either both of 'dump' and 'load' must be None, or neither")
+ # when both load and dump are None, this is a small
+ # optimization that can be a large optimization if the returned
+ # value is one of the main four options of the selected btree
+ # family (BTree, TreeSet, Set, Bucket).
res['interface'] = val.interface
res['multiple'] = data.get('multiple', False)
res['call'] = zope.interface.interfaces.IMethod.providedBy(val)
@@ -162,6 +166,7 @@
return values, values, False
elif data['dump'] is None and isinstance(values, (
data['TreeSet'], data['BTree'], data['Bucket'], data['Set'])):
+ # this is the optimization story (see _add)
return values, values, True
else:
cache = {}
@@ -232,7 +237,9 @@
# hand, if there are a lot of removals, it's
# probably quicker just to make a new one." We
# pick >25 removals, mostly arbitrarily, as our
- # "cut bait" cue.
+ # "cut bait" cue. We don't just do a len of
+ # removed because lens of btrees are potentially
+ # expensive.
for ix, t in enumerate(removed):
if ix >= 25: # arbitrary cut-off
newTokens = data['TreeSet'](newTokens)
More information about the Checkins
mailing list