[ZODB-Dev] Re: [Bug] Another FileStorage packing bug
Jeremy Hylton
jeremy at zope.com
Wed Jan 14 17:49:38 EST 2004
On Tue, 2004-01-13 at 05:51, Dieter Maurer wrote:
> Instead is was extremely bad runtime behaviour in connection
> with "_resolve_backpointer" and "_data_find".
> With the ZODB 3.1 implementation (at least) it is easy
> to get quadratic runtime behaviour with respect to the number
> of objects contained in a transaction.
If I follow correctly, the problem is that for each backpointer in a
transaction, you try to find the transaction for that backpointer. You
always do a linear search through the file, starting at the end, to find
the transaction for that backpointer. Even if there are N objects all
in the same transaction, you do the search N times.
That's not exactly quadratic, because it is O(NM) where N is the number
of backpointers and M is the number of transactions in the database. If
either is large, it would be very slow.
> I attach a patch (with respect to ZODB code as delivered with
> Zope 2.6.2) that uses a transaction cache to avoid this
> catastrophic behaviour:
>
> In our case, packing took about 10 min with the patch.
> Without the patch, we had to abort packing after 2 weeks (!).
> To complete the packing process, another 2 weeks would have
> been necessary.
I'm not sure I understand what your patch is doing. Could you explain
the strategy? I would think a chief problem is _txn_find(), but it
looks like you're not changing that.
Or is the problem that the transactions themselves contain many objects
and the search in _data_find() for the record matching a particular oid
is expensive.
Either way a cache seems like a decent solution. I wonder, though, if
we should rethink the algorithms for these cases. For pack and undo,
we're pretty much doing a sequential operation on all the objects. So
it might be clearer to rework the APIs in some way to exploit the
regular access patterns. On the other hand, a cache works without
having to rethink the APIs.
Jeremy
More information about the ZODB-Dev
mailing list