[ZODB-Dev] Re: ZODB Benchmarks

Thu Dec 6 15:32:35 EST 2007

On Thu, 2007-12-06 at 15:05 -0500, Jim Fulton wrote:
> On Dec 6, 2007, at 2:40 PM, Godefroid Chapelle wrote:
> 
> > Jim Fulton wrote:
> >> On Nov 6, 2007, at 2:40 PM, Sidnei da Silva wrote:
> >>>> Despite this change there are still a huge amount
> >>>> of unexplained calls to the 'persistent_id' method of the  
> >>>> ObjectWriter
> >>>> in serialize.py.
> >>>
> >>> Why 'unexplained'? 'persistent_id' is called from the Pickler  
> >>> instance
> >>> being used in ObjectWriter._dump(). It is called for each and every
> >>> single object reachable from the main object, due to the way Pickler
> >>> works (I believe). Maybe persistent_id can be analysed and optimized
> >>> for the most common cases?
> >> Yup.
> >> Note that there is a undocumented feature in cPickle that I added  
> >> years ago to deal with this issue but never got around to  
> >> pursuing.  Maybe someone else would be able to spend the time to  
> >> try it out and report back.
> >> If you set inst_persistent_id, rather than persistent_id, on a  
> >> pickler, then the hook will only be called for instances.  This  
> >> should eliminate that vast majority of the calls.
> >> Note that this feature was added back when testing was minimal or  
> >> non-existent, so it is untested, however, the implementation is  
> >> simple enough.  :)
> >
> > Do you mean that the ZODB has enough tests now that making the  
> > change and running the tests might already be a good proof ?
> 
> No, I mean that pickle and cPickle lack tests for this feature.
> 
> > Or should we be more prudent ?
> 
> It would be nice to try this out with ZODB to see if it makes much  
> difference.  If it does, then that would provide extra motivation for  
> me to add the missing test.
> 
> Roché Compaan said he would try it out, but I just realized that he  
> might have been waiting for me.

Sorry for not responding earlier. I actually tried this out immediately
after you suggested it and was very impressed with the improvement in
performance. I have been meaning to write back to give a thorough report
but a project with insane deadlines caught up with me. This project ends
this week so I plan to continue with my benchmark test next week and
give feedback thereafter.

The amount of calls to persistent_id dropped dramatically and in a test
of 10 million inserts the insert rate almost doubled from 1000 thousand
to 2000 inserts per second for at least the first million inserts. The
insert rate decreases rapidly thereafter until it drops to the insert
rate recorded before the persistent_id change. I guess at this point the
overhead of bucket splits are two high.

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za