[Zope-dev] catalog performance: query plan

Tres Seaver tseaver at palladion.com
Mon Nov 10 12:39:58 EST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hedley Roos wrote:
>>> object_implements    |KeywordIndex     |0.2172234|         4.6
>> This is clearly not the same issue as the other KeywordIndexes:  in
>> fact, I am astonished that anybody would be using a KeywordIndex for
>> this at all.  I would suspect that the real problem here is in the
>> appliation, rather than the index itself.
> 
> The app is Plone :)

I don't know how Plone computes the values here (are they actual
interface objects, or their monikers), nor how it is queried.  I am
suspicious that there is some query-time bogosity, however.

>>> portal_type          |FieldIndex       |0.0025984|      384.84
>> This one is surprising:  its performance should be pretty similar to the
>>  other FieldIndexes (e.g., 'review_state') which map a controlled
>> vocabulary onto the entire corpus.  Was the query different than
>> 'review_state' (e.g., multi-valued vs. single-valued)?
> 
> portal_type queries are usually multivalued in Plone.

What are the use cases for that?

>>> sourceUID            |FieldIndex       |0.0004886|     2046.31
>> Probably bogus, but I don't know how it is used.
> 
> 
> Plone's reference_catalog

I'm betting that using a catalog at all is a flawed choice.

>>> UID                  |FieldIndex       |0.0003070|      3257.1
>> Note that this is the worst-case scenario for a FieldIndex:  there is
>> exactly one value for every key.  This shouldn't be "indexed" at all, in
>> fact, beyond a simple BTree (UID -> rid).
> 
> I've never even thought of that. Perhaps the catalog is used to
> present a familiar API.
> 
>>> targetUID            |FieldIndex       |0.0002287|     4372.12
>> I don't know what this one is used for, but it should probably be
>> scrapped as well.
> 
> More reference_catalog.
> 
>>> exact_getUserId      |FieldIndex       |0.0001931|     5177.79
>>> exact_getUserName    |FieldIndex       |0.0001816|     5504.39
>> I don't know how the application uses either of those indexes, but they
>> are almost certainly bogus in any normal catalog.
> 
> Membrane and remember. They're currently tied to Plone but efforts are
> being made to make them work with CMF.
> 
>>> relationship         |FieldIndex       |0.0000822|     12153.1
>>> id                   |FieldIndex       |0.0000822|    12161.81
>>> end                  |DateIndex        |0.0000623|    16027.48
>>> getGroups            |FieldIndex       |0.0000278|    35973.45
>> This is almost certainly bogus:  FieldIndex is not supposed to be used
>> with multi-valued terms.
> 
> Plone stuff, but I am intrigued by your statement. Why can FieldIndex
> not be used with multi-valued terms?

Because FieldIndex is designed for either exact-match queries or range
queries:  range queries are obviously impossible for a 'getGroups'
method, and exact-match is nearly as dubious.  I would have expected
this to be a KeywordIndex, if it needed indexing at all.

>>> Subject              |KeywordIndex     |0.0000253|    39413.57
>> This is the use-case for which KeywordIndex is designed.  Was the query
>> just a single term, by chance?
> 
> The simplest term is a list with only a single term (not counting the
> trivial case). It should be worse with more terms right?

I am reasonably confident that multi-valued queries against either
FieldIndexes or KeywordIndexes will perform worse than single-valued
queries against the same index.

>>> Title                |ZCTextIndex      |0.0000128|    77809.46
>> This should be removed:  there is no valid use case for doing a
>> full-text search restricted only to the title.
> 
> Plone specific.

No, still bogus.  This is an egregiously stupid choice, with *large*
indexing-time downsides.

>>> Description          |ZCTextIndex      |0.0000116|    86241.39
>> Again, should be removed.
> 
> Again, Plone specific :)

Again, egregiously bogus.

>>> getEmail             |ZCTextIndex      |0.0000113|    87849.05
>> Should *definitely* be removed:  how can you do full-text search on an
>> e-mail address?
> 
> I think membrane is responsible for this, but you're right.
> 
> 
>>> SearchableText       |TextIndex        |0.0000113|    88466.69
>> Where did this one come from?  The 'SearchableText' above is a ZCTextIndex.
> 
> Membrane!
> 
> Kinda pointless for me to continue since this is turning into a
> Plone-specific discussion on zope-dev. But at least the whole exercise
> has forced us to look in detail into how all these indexes affect
> performance with a zodb with many many objects.
> 
> Roche investigated Tesdal's queryplan today end it seems to solve
> nearly all our performance problems. He'll have to elaborate.

Ripping out stupid indexes is one of the first things I do to optimize a
client's badly-performing Plone site.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJGHHu+gerLs4ltQ4RAoDgAJ9sjGoOG2KftnqFIbVxFy0sNk8EDwCfaSbH
ouY7+FnSbSJvYSBtI//31ZY=
=3SeB
-----END PGP SIGNATURE-----



More information about the Zope-Dev mailing list