[ZODB-Dev] ZODB idioms

Casey Duncan casey@zope.com
24 Jun 2002 10:14:23 -0400


The problem won't be so much the quantity of objects, but the frequency
of concurrent writes in the application and therefore the number of
conflict errors. 

Storing this many objects in a BTree should be fine, they will rebalance
themselves AFAIK. Insertion of more random ids might tend to be faster
though as the BTree grows since rebalancing will happen much less often.
How big an affect that would have I am not sure. You might want to do
some timing experiments with large BTrees.

What often works well, if the ids must sort in sequence is to use
timestamps. If a conflict error occurs then you can simply retry after a
set period of time (updating the timestamp to a later value). Random ids
might be even better, but for hundreds of thousands of objects, you
should reseed the random number generator periodically, or it may repeat
itself (assuming your app is that stable ;^).

If you must use sequencial integers, then I would make a persistent
class that handles both the counter and the btree storage (using a
single append method). The simplest conflict handler would simply throw
away a conflicting write and propagate the error. This will be simple
but not very efficient. Another approach would be to write a more
sophisticated handler which can serialize the conflicting records.

I would go with the former until you determine it gives you inadequate
performance. That's pretty much what you have done with your code. I
would encapsulate the database machinery a bit more though so you don't
need to make changes everywhere when your conflict resolution strategy
changes.

hth,

-Casey

On Sat, 2002-06-22 at 12:47, Ury Marshak wrote:
> (Is this list ok with postings related not to developement _of_ ZODB,
> but to developement _with_ ZODB? ;)
> 
> 
> Now to the question at hand - the 'sequential numbers' idiom.
> Being new to ZODB (and object Dbs in general) I'm sure there
> are other options that I missed, but these are the two approaches
> I've came up with - please comment on them, cause for a newbie
> their [dis]advantages are not immediately obvious, especially
> what would happen with a large number of objects...
> 
> Let's assume we are writing a bug-tracking application. The app
> will have to be multiuser, so it seems that ZEO server will be
> serving several stations. Let's say we want to keep 'BugReport'
> objects somehow together, it seems that for a large number of
> objects a list would be very inefficient, so we would probably
> use some sort of a BTree.
> 
>     zodb_root = ...
>     if not zodb_root.has_key('AllBugReports'):
>         zodb_root['AllBugReports'] = IOBTree()
>         get_transaction().commit()
>     bugreports_tree = zodb_root['AllBugReports']
> 
> Now the BugReport will need a unique number associated with it.
> One option is to use it as a key in a tree:
> 
>     brep = BugReport('Your software is broken!', tester115, '15 Jan')
>     br_id = ... # our guess at the current maximum id
>     while not bugreports_tree.insert(br_id, brep):
>         br_id += 1
>     #   btw, do we need a commit here, or is 'insert' autocommiting for us?
> 
> It seems that this isn't a good idea, since the tree would become
> extremely unbalanced (I didn't read the BTrees C source - could it
> be rebalancing trees behind the scenes? Or is there a method to do
> it?)
> 
> The other option would seem to generate new IDs randomly and keep a
> separate 'NumberKeeper' object to keep track of the highest number.
> 
> initializing the database:
> 
>     class NumberKeeper:
>         pass
>     nk = NumberKeeper()
>     nk.num = 1
>     zodb_root['BugReportsNumberKeeper'] = nk
> 
> creating new object:
> 
>     while 1:
>         try:
>             #     get next number
>             br_id = nk.num
>             nk.num += 1
> 
>             #     create object
>             brep = BugReport('Your software has a virus!', tester5, '22
> Jan')
>             brep.rep_num = br_id
> 
>             #     try to commit, should raise a ConflictError if somebody
> had
>             #     already modified nk.num
>             rand_id = randrange(0, ......     # maxint? also have to handle
> conflicts ...
>             bugreports_tree[rand_id ]= brep
>             get_transaction().commit()
> 
>         except ConflictError:
>             get_connection().sync()   #   do we need this or is it synced
>                                                        #   automatically on
> ConflictError?
>         else:
>             break
> 
> 
> Are there other possibilities? (considering that skipping some numbers
> or having duplicates is not an option). Which is the best approach?
> How well it's going to scale for hundreds of thousands of objects?
> Which is going to be easier to use?
> 
> Thanks for bearing with me,
> Ury
> 
> 
> 
> 
> 
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
> 
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev