[ZODB-Dev] Unique Object ID

Thu Jun 5 00:32:37 EDT 2003

On Wednesday 04 June 2003 04:49 pm, Johan Dahlin wrote:
> We've been running into a issue here while developing IndexedCatalog.
> IndexedCatalog stores information about it's objects (which always are
> inherited from a special class) in a OOBTree. 
> They are keyed by the id of the object. In the current invocation of
> IndexedCatalog we have been using id() of the object (eg the memory
> address) as the id of the object. 
> 
> Recently we have found out that it's not very reliable, since it might
> create conflicts if the same memory address is returned when creating
> objects. So we need to find out a reliable way of creating a object id. 
> There are basically two options that I can think of:
> 
> 1) use _p_oid

bad idea, these can change if you move objects between databases. So there is 
a pretty strong possibility of collisions.

> We must first store the object in a temporary location somewhere under
> the root commit(1) pull it back and get the oid. I think this is really
> bad for performance. Is it possible to get an _p_oid in another way? Or
> store the object in a connection and not a root?
> 
> 2) Using a counter, increase it for each object
> 
> I believe this can create problems regarding to multiple connections and
> conflict errors. Create an object in connection A, create another one in
> connection B, commit A, commit B. *boom*

This should work if you use a BTrees.Length.Length object which has automatic 
conflict resolution. Its a pretty clever and elegant piece of code too, give 
it a looksee. There are also tests in the BTree code that you can look at to 
see how to test your code for conflicts.

> Am I missing something, or is it another way of doing this?
> Comments, suggestions highly appreciated.

BTW: Zope Catalog uses (in 2.7 head) random.randint(-2000000000, 2000000000) 
to generate record ids. It then double-checks that the id hasn't already been 
taken. This should probably also be done if you use a counter just to be 
safe. Performance here isn't really an issue, since relative to writing to 
the database, generating a random number is not expensive.

Actually Catalog uses a combination of random and sequencial ids. That way if 
many objects are added at once, they tend to cluster in the BTree data 
structure minimizing the number of nodes and buckets that need to be touched. 
Have a look at the catalogObject method of Catalog.py in the Zope head.

Also, if use an integer rid, then you can use IOBTrees too, which are 
optimized for integer keys.

ZCatalog uses paths to uniquely identify objects in zope. There are BTrees of 
path->rid and rid->path in there so that the indexes can just use the integer 
rids. Perhaps you could also generate something like a path. Each object in 
the database must have a unique path to access it. Perhaps that can be used 
as a key in your catalog.

-Casey