[ZODB-Dev] How expensive are savepoints?

Sat Jul 9 11:04:27 EDT 2005

[Christian Heimes]
> How expensive and costly are savepoints?

6, maybe 6.2, depending on the units you're using <wink>.  Seriously, how
can such a question be answered?  How expensive is math.log()?

> I wasn't able to find informations about it in the Zope docs.

Savepoints are very new, and AFAIK nobody has done timing experiments on
them.

> Are they as expensive as sub transactions or are they just using some CPU
> cycles?

Savepoints are a generalization of subtransactions (and subtransactions are
now implemented "on top of" savepoints), so if you think the cost of a
subtransaction was 100, the cost of a savepoint will be somewhere around 100
too.  Modified state has to be written to temp file(s) in either case, and
in such a way that it can be forgotten later if desired.

> I'm thinking about using savepoints in my migration code. The code is
> migrating a possible large amount of objects (hundreds up the tenth of
> thousands). I don't want the code to fail because the last object has an
> unicode decode issue.

This sounds like a good use for savepoints.

> Code example:
>
> for ob in objs:
>      savepoint = transaction.savepoint()
>      try:
>          migrate(ob)
>      except ConflictError:
>          raise
>      except:
>          log()
>          savepoint.rollback()
>
> If savepoints are costly I would create a new savepoint every 10 or 50
> objets.

If I were you, I'd just _try_ it, and fiddle as necessary until I was happy
with the tradeoffs I saw on my real data.  It's not possible to guess the
outcome; e.g., if "a typical call" to migrate() takes 10 seconds for your
objects, the time to make a savepoint will probably be relatively
insignificant; if migrate() takes a nanosecond, the time to make a savepoint
will be relatively huge.

I tried this code:

"""
# ...
# tedious setup code to open a database and hang `tree` off the
# root object
# ...

start = now()
for i in range(N):
    tree[i] = 2*i
    sv = transaction.savepoint() # "the savepoint line"
transaction.commit()
finish = now()
print finish - start
"""

with and without the savepoint line, where `tree` was an OOBTree and `N` was
1000, and it took 10x longer with the savepoint line.  This is probably
close to a worst case, because `tree[i] = 2*i` most often modifies the same
bucket it modified on the previous iteration, and taking a savepoint on each
iteration therefore requires writing out the full state for each bucket many
times (about 15 times each, in fact).  Without the savepoint line, each
bucket state is materialized to disk only once.

If I change it to an IIBTree, the discrepancy is even larger (about a factor
of 15), because IIBTrees tend to put many more (key, value) pairs in their
buckets than OOBTrees do, so each bucket state gets written out many more
times with the savepoint line (about 60 times each) than without.

OTOH, if your idea of migrate() doesn't make changes to the same containers
(or other persistent objects) across iterations, the discrepancy should get
smaller, approaching a factor of 1.0 in the limit (if no two iterations
modify the same persistent object).  It's not possible to quantify that in
advance without knowing everything about your objects, your containers, and
all the details involved in what your migrate() does.

Of course if this is a one-time migration, I wouldn't worry about expense at
all -- for all I know, it took me longer to write this reply than it will
take you to run the migration script <0.6 wink>.