[ZODB-Dev] How to update an object in a multithreading application?

Vincent Pelletier plr.vincent at gmail.com
Thu Mar 22 21:00:34 UTC 2012


Le mardi 20 mars 2012 13:06:16, Sebastian Wain a écrit :
> It is a persistent queue with acknowledgment. The issue I see is the sync
> of the BTree between threads. Consumers/Producers get/put elements, one at
> a time, so before that operation can take place it will need to sync to
> have the most updated version of the BTree in all threads.

Indeed, you need to get a fresh snapshot when looking for an item to process.
And you also want that transaction to be as short as possible to reduce the 
time spent resolving conflicts (ie, rolling back transactions and starting 
over when an item has already been reserved by a competing transaction), hence 
probably disconnected from the code actually processing items.

This might be possible by providing ZODB.Connection:Connection.open() with a 
transaction manager different from the threaded one, to use for those sub-
transactions. But I've never used this feature (I'm only using ZODB through 
Zope).

The way I see it, it would go like:
- threaded TM begin
- pop from queue:
  - other TM begin on a ZODB connection reserved to queue access
  - pop item
  - commit (or abort if something went wrong, and re-raise)
- process
- acknowledge:
  - other TM bein on a ZODB connection reserved to queue access (can reuse
    above one)
  - ack
  - commit (etc, as above)
- threaded TM commit (again, or abort on exception)

If processing itself involves other persistent objects (ex: the queue item 
describes an action to take on another persistent object), two connections 
would have to be opened on the same database, which can lead to errors when 
moving objects around (if not careful, an object fetched from a transction 
will be reused with another, which will raise an exception).

Also, using poped item outside its connector's transaction will cost some 
hair, and will probably need to be mutated to some non-persistent form before 
leaving that transaction (otherwise, any alteration of it will raise).

TL/DR: I don't know how to implement it correctly without actually doing it.

> Is this in the context of Python 3.y? because it is a multiprocess Queue or
> on Python 2.x?

Tested on 2.7. I don't know on 3.x. Verified to *not* occur on pypy 1.7 
(although I don't know how it can really fix the issue).
The use case is a single process, multi thread app reading a file with many 
nested structures, each level sending chunks to the higher level. I wished to 
use ply.yacc to do the parsing so I could easily alter the grammar, but it 
cannot (with its default API) accept partial inputs, and I need that. So I 
used queues & threads. I then used a simple queue implementation (no count, no 
ack), and finally modified ply to work on partial inputs - with great speed 
improvement, even over the "simpler queue" version.

More there:
https://github.com/vpelletier/ITI1480A-linux/blob/master/iti1480a/parser.py

> There is a way to have a "history free" storage? obviously in the context
> of ZODB.

AFAIK, relStorage supports this, not sure about others.

Also, the mandatory loss of conflict resolution on history-free storages might 
cause performance regression. I believe a periodic history-prune packing of 
reasonably-old transactions (compared to processing duration) might turn out 
to be better, and also readily available with any ZODB back-end. If ran on a 
big-enough (compared to available RAM and disk speed) ZODB, it will start 
causing problems, though.

Regards,
-- 
Vincent Pelletier


More information about the ZODB-Dev mailing list