[ZODB-Dev] nontransactional, oid, and thread safety. oh my!

Wed Sep 29 23:56:30 EDT 2004

[Randy]
> I am investigating using ZODB 3.2 for a web based application.  I plan to
> use ZODB with a FileStorage.

That's a popular combo indeed.

> I have a first pass implementation up and running; however, I have some
> questions, that maybe the gurus could help me with.
>
> 1) Is there a way to disable transactions?

No.

> My app is write intensive and the writes are causing the database
> to get VERY large, in a short period of time.

"Writes" don't change the database.  The only thing that changes the
database is committing a transaction.  If you don't commit(), you can run
for months and the database won't change at all.

> I could implement a daily pack() strategy to minimize the size,

Many do.

> but that's not the solution to fit my problem.
>
> Transactions are not a requirement for my application; is there way to
> disable transaction in FileStorage?

No.

> If not, are they any good Storage implementations that don't do
> transactions?

No, and not even bad ones <wink>, because it wouldn't make sense.  The only
way to get anything *into* a storage is to commit a transaction.  So a
storage implementation without transactions couldn't store any data -- such
a storage would be useless.

> Or should I roll my own?

I'm lost as to what your requirements are, but if you've got no use for
transactions then you've also got no use for database technology.  Maybe
save your objects to a file using the pickle module from time to time?
Maybe don't use files at all?

> 2)  Entries written into the database have a unique id.  If you glance at
> the transaction data, you see "oid" references, that uniquely identifies
> these database entities.

There is a unique oid per object, and the oid persists across object
revisions.

> Are those oid's publicly available?

It's hard to say what the public API is <wink/sigh>.  A persistent object
obj's oid is inspectable via obj._p_oid, and lots of people access it.
Whether they should is open to debate.

> I'm using an incrementing counter stored in the database, for my id's.
> However, I'd much rather use the ones in the database.

Search the mailing-list archives for more on that.

> 3).  Thread safety.  What are some best practices can follow to make
> database writes and reads thread safe?

Safest is not to use threads at all.  That isn't entirely flippant:  if it's
possible for your app, the best approach on many counts is to share a
database across distinct single-thread processes, each process making its
own connection via ZEO.  Then you avoid all thread traps by virtue of not
having multiple threads.  Maximally simple, maximally safe.

If you think you have to use multiple threads, then each thread should open
its own connection, and an object loaded in one thread should never be used
or modified in a different thread.  That doesn't mean threads T1 and T2
can't both do, e.g.,

    obj = my_thread_connect.root()['some_object']

That's fine.  They each get a *distinct* copy of obj in that case.  It means
that T1 can't do

    obj = conn.root()['some_object']

and then "pass" (make available, in any way) the Python object bound to
'obj' to T2.  ZODB's object model isolates threads from each other at the
database level, but can't stop clients from screwing it up at the Python
level.

> I've read about some of the strategies that ZODB uses to handle
> conflicts, including "try it 3 times"

No, ZODB doesn't do that.  A ZODB client can, if it wants to, but it has to
arrange to do that itself then.  Zope is a ZODB client that does that.

> and providing "resolution hooks" in case it fails.

It's rare for client objects to program resolution hooks, but some do.
ZODB's BTrees implement elaborate conflict resolution, and most clients rely
on BTrees for scalable data structuring.  ZODB was not intended for high
write-conflict scenarios, though.

> I've also read about using a connection object for each thread.

You'll do more than just read about it if you don't <wink>.

> Any recommendations from the trenches?  I could always roll my own
> connection pool, and hand those out to threads wanting to read/write from
> the database.

Why would you want to roll your own?  ZODB supplies a connection pool.  When
you do something like:

    st = FileStorage.FileStorage("temp.fs")
    db = ZODB.DB(st)
    cn = db.open()

the DB object ('db') has little purpose in life except to manage a
connection pool for the storage passed to the DB constructor.  Doing

    cn.close()

later returns the Connection object to db, and the Connection's cache
remains intact; a subsequent db.open() will reuse that Connection object.
Connection.close() and DB.open() are really more like a pause and resume
pair.