[ZCM] [ZC] 814/ 3 Accept "Quadratic ZODB bloat caused by "PathIndex""

Collector: Zope Bugs, Features, and Patches ... zope-coders-admin@zope.org
Thu, 20 Feb 2003 14:47:51 -0500


Issue #814 Update (Accept) "Quadratic ZODB bloat caused by "PathIndex""
 Status Accepted, Zope/bug medium
To followup, visit:
  http://collector.zope.org/Zope/814

==============================================================
= Accept - Entry #3 by jeremy on Feb 20, 2003 2:47 pm

 Status: Pending => Accepted

 Supporters added: jeremy

We should definitely use the IITreeSet.  I can't comment on the other issue.

________________________________________
= Comment - Entry #2 by ajung on Feb 20, 2003 4:36 am


________________________________________
= Request - Entry #1 by d.maurer on Feb 20, 2003 2:04 am

A "PathIndex" maps (pathsegment,level) onto the "IISet" of document ids
with "pathsegment" at "level" in their path.

An "IISet" is a single persistent object, written as a whole to
the ZODB. Its size is proportional to the number of entries.
Therefore a ZODB storage with undo support grows quadratically
with respect to the number of entries (between packs).

The standard "path" index indexes based on the physical path.
Therefore, the size of the index entry of (at least) one
of the top level pathsegments is in the order of all indexed
objects.

Once, you have lots of indexed objects you will observe
significant ZODB growth between packs.


The fix would be easy: "PathIndex" should use "IITreeSet" rather
than "IISet" to store the document id lists (as do other indexes).
(There are more bugs in "PathIndex": e.g. it does not remove
old index information when a new "index_object" brings in new data.
A code review would be appropriate.)


A quick workaround: delete the "path" index unless you really need it.

==============================================================