[ZODB-Dev] Re: How expensive are savepoints?

Christian Heimes christian at cheimes.de
Sat Jul 9 19:30:53 EDT 2005


Tim Peters wrote:
> [Christian Heimes]
> 
>>How expensive and costly are savepoints?
> 
> 
> 6, maybe 6.2, depending on the units you're using <wink>.  Seriously, how
> can such a question be answered?  How expensive is math.log()?

My professor for numerical mathmatics would say it is very expensive 
because log() takes much more than 10 cpu cycles. *g*

> Savepoints are a generalization of subtransactions (and subtransactions are
> now implemented "on top of" savepoints), so if you think the cost of a
> subtransaction was 100, the cost of a savepoint will be somewhere around 100
> too.  Modified state has to be written to temp file(s) in either case, and
> in such a way that it can be forgotten later if desired.

I was able to track down a savepoint() call to TmpStorage seems to store 
parts of the current subtransaction to a file if I'm right. I wasn't 
sure if savepoint() is either just marking a point in the middle of a 
transaction or storing the transaction somewhere. From my point of view 
it is costly compared to __add__(). *g*

> If I were you, I'd just _try_ it, and fiddle as necessary until I was happy
> with the tradeoffs I saw on my real data.  It's not possible to guess the
> outcome; e.g., if "a typical call" to migrate() takes 10 seconds for your
> objects, the time to make a savepoint will probably be relatively
> insignificant; if migrate() takes a nanosecond, the time to make a savepoint
> will be relatively huge.

I'm migrating CMF objects to Archetypes objects including metadata, 
security and so on. The migiration of a typical object takes about 0.2 
to 1 sec including catalog updates. A folderish object with hundres to 
thousands of children requires much more time but that's the fault of 
the catalog. Every object is uncatalog and catalog again ... ugly, time 
consuming but required in Zope2. I'm wishing we have events ...

> with and without the savepoint line, where `tree` was an OOBTree and `N` was
> 1000, and it took 10x longer with the savepoint line.  This is probably
> close to a worst case, because `tree[i] = 2*i` most often modifies the same
> bucket it modified on the previous iteration, and taking a savepoint on each
> iteration therefore requires writing out the full state for each bucket many
> times (about 15 times each, in fact).  Without the savepoint line, each
> bucket state is materialized to disk only once.
> 
> If I change it to an IIBTree, the discrepancy is even larger (about a factor
> of 15), because IIBTrees tend to put many more (key, value) pairs in their
> buckets than OOBTrees do, so each bucket state gets written out many more
> times with the savepoint line (about 60 times each) than without.

Nice :)

> 
> OTOH, if your idea of migrate() doesn't make changes to the same containers
> (or other persistent objects) across iterations, the discrepancy should get
> smaller, approaching a factor of 1.0 in the limit (if no two iterations
> modify the same persistent object).  It's not possible to quantify that in
> advance without knowing everything about your objects, your containers, and
> all the details involved in what your migrate() does.

I could write some code that migrates all objects in a folder before 
calling savepoint() but it's not worth the complexity and code.

> Of course if this is a one-time migration, I wouldn't worry about expense at
> all -- for all I know, it took me longer to write this reply than it will
> take you to run the migration script <0.6 wink>.

I wouldn't be sure in your place. I'm migrating all data of nearly all 
objects from one set of content types (CMF) to another set 
(ATContentTypes). For a very large site like plone.org the migration was 
running about 1 to 2h.

Christian



More information about the ZODB-Dev mailing list