[ZODB-Dev] Zope, ZEO, BTrees, and conflicts

Thu, 13 Dec 2001 10:44:26 -0800

I'm no data-type guru, so forgive the ignorance of this question: does
anyone know if there's an efficient algorithm to merge 2 independent BTrees
together (primarily the case of merging a small tree in to a larger one)?
My thoughts are migrating towards offline distributed tree creation with
scheduled (hopefully quick?) offset merges across clients.  Of course, this
is purely in theory, and I don't need to do this now, it just might be nice
for some of my future use cases.

Sean

-----Original Message-----
From: jeremy@zope.com [mailto:jeremy@zope.com]
Sent: Thursday, December 13, 2001 4:08 AM
To: sean.upton@uniontrib.com; Chris Withers
Cc: zodb-dev@zope.org
Subject: Re: [ZODB-Dev] Zope, ZEO, BTrees, and conflicts

>>>>> "CW" == Chris Withers <chrisw@nipltd.com> writes:

  CW> sean.upton@uniontrib.com wrote:
  >>
  >> mainly that is because I'm not running it on really fast
  >> hardware).  My head is toying with the idea of distributing this
  >> across several ZEO clients, but I'm betting there would be a lot
  >> of issues associated with doing that.

  CW> Yeah, I'm pretty sure ZEO would handle lots of writes like this
  CW> badly, I'm sure Jim or Jeremy will jump up and down on me if
  CW> that's not true ;-)

I guess I better answer one way or another, eh?

If I recall the context, Sean wants to make lots of updates to a
single BTree.  I don't think it would too helpful to do that using
multiple ZEO clients -- or multiple threads in a single process.  Each
ZEO client will have a cached copy of the objects its using.  Every
time another client modifies one of those objects, every other client
needs to reload its copy of the object.  I expect that for the BTrees
case this would involve a lot of cache consistency overhead.

One possibility, which sounds good in theory but may not play out well
in practice, would be to perform operations on independent data
structures on each client and then merge the data structures are the
end.  For indexing-style operations, there's a high overhead for
updating the index that's relatively independent of the number of
objects being indexed.  If you can batch the updates, it's generally
beneficial.

Jeremy