[Checkins] SVN: Sandbox/J1m/zodb-doc/intro.txt checkpoint.

Jim Fulton jim at zope.com
Mon May 17 07:51:37 EDT 2010


Log message for revision 112411:
  checkpoint.
  

Changed:
  U   Sandbox/J1m/zodb-doc/intro.txt

-=-
Modified: Sandbox/J1m/zodb-doc/intro.txt
===================================================================
--- Sandbox/J1m/zodb-doc/intro.txt	2010-05-17 11:12:13 UTC (rev 112410)
+++ Sandbox/J1m/zodb-doc/intro.txt	2010-05-17 11:51:37 UTC (rev 112411)
@@ -375,7 +375,7 @@
 Configuration strings
 ---------------------
 
-ZODB supports the use of textual configuration files to define
+ZODB supports the use of textual configuration strings to define
 databases, storages, and ZEO servers.  Production applications
 typically create database objects by loading configuration strings
 from application configuration files [#zconfig]_.
@@ -395,7 +395,7 @@
 
 The configuration syntax was inspired by the Apache configuration
 syntax. Configuration sections are bracketed by opening and closing
-types tags and can be nested. Options are given as names and values
+typed tags and can be nested. Options are given as names and values
 separated by spaces.
 
 In the example above, a ``zodb`` tag defines a database object
@@ -409,13 +409,16 @@
 Concurrency
 ===========
 
-ZODB supports accessing databases from multiple threads.  Each thread
-operates as if it has it's own copy of the database.  Threads are
-synchronized through transaction commit.
+ZODB supports accessing databases from multiple threads, including
+threads in separate processes.  This is accomplished using per-thread
+database connections and transaction managers.  Each thread operates
+as if it has it's own copy of the database.  Threads are synchonized
+through transactions
 
-Each thread opens a separate connection to a database.  Each
-connection has it's own object cache. If multiple connections access
-the same object, they each get their own copy. Let's look at an example:
+In a typical application, each thread opens a separate connection to a
+database.  Each connection has it's own object cache. If multiple
+connections access the same object, they each get their own
+copy. Let's look at an example:
 
    >>> conn1 = db.open()
    >>> author1 = conn1.root.authors['tolkien']
@@ -455,46 +458,185 @@
 Transaction managers
 --------------------
 
+When we use ``transaction.commit()`` to finish a transaction, we're
+interacting with a transaction manager. There is a transaction manager
+per thread, and by default ZODB uses the current thread's transaction
+manager.
 
+The example in the last section was a little odd, because the two
+connections shown were in the same thread and therefore used the same
+transaction manager.  That's why the transaction commit affected both
+connections.
 
+Applications can create and manage their own transaction managers.
+This isn't commonly done, except in tests, but we'll do it below to
+illustrate more carefully the normal isolation of application threads.
 
+We close the second connection and reopen it with a separate
+transaction manager:
 
+    >>> conn2.close()
+    >>> transaction_manager2 = transaction.TransactionManager()
+    >>> conn2 = db.open(transaction_manager2)
 
+Now, we'll change the author name back to what we had before:
 
+    >>> author1.name = 'J.R.R. Tolkien'
+    >>> transaction.commit()
 
+    >>> author2.name
+    'John Ronald Reuel Tolkien'
 
+Even though we committed the transaction, we didn't see the update
+reflected in the second connection.  This is because the second
+connection is now using a different transaction manager, as would
+be the case if it was runnning in a separate thread.
 
+To see the change in the second transaction, we need to end it's
+current transaction.  Because we didn't make any changes in the second
+connection, we can use abort or commit to end the transaction.
 
+    >>> transaction_manager2.commit()
+    >>> author2.name
+    'J.R.R. Tolkien'
 
+Conflicting changes
+-------------------
 
+When multiple threads modify the same object, this is considered a
+conflict. Generally, when there's a conflict, the first thread to
+commit it's changes will win and other threads will have to be redone.
+For example, let's look at what happens when we change the auther name
+in both connections:
 
+    >>> author1.name = 'John Ronald Reuel Tolkien'
+    >>> author2.name = 'John Tolkien'
+    >>> transaction.commit()
+    >>> transaction_manager2.commit()
 
+Here, we get a conflict error when we commit the changes made in the
+second connection, because the conflict with changes made in the
+first.  If we want to make the second change, we need to abort the
+transaction and redo it:
 
+    >>> transaction_manager2.abort()
+    >>> author2.name = 'John Tolkien'
+    >>> transaction_manager2.commit()
 
+Conflicts are troublesome for 2 reasons:
 
+- Application code has to anticipate conflicts and be prepared to
+  retry transactions.
 
+- Retrying transactions is ususally expensive, hurting application
+  responsiveness and throughput.
 
+There are a number of ways to avoid conflicts:
 
+- Organize application threads and data structures so that objects are
+  unlikely to be modified by multiple threads at the same time.
 
+- Use data structures that support conflict resolution.
 
+Conflict resolution
+-------------------
 
+An object can provide logic to sort out conflicting changes, allowing
+conflicting changes to be committed. This is one of the advantages of
+BTrees.  Usually, when different BTree keys are modified in different
+threads, the conflicting changes can be resolved.  This allows
+books to be added to authors at the same time:
 
+    >>> book1 = author1.new_book['The Silmarillion']
+    >>> conn1.root.books['The Silmarillion'] = book1
 
+    >>> book2 = author1.new_book['The Children of Hurin']
+    >>> conn2.root.books['The Children of Hurin'] = book2
 
+    >>> transaction.commit()
+    >>> transaction_manager2.commit()
 
+    >>> import pprint
+    >>> pprint.pprint(list(author2.books))
 
+Handling conflict errors and retrying transactions
+--------------------------------------------------
 
+When a conflict error arises, the current transaction must be
+aborted.  Generally, then, the transaction will need to be
+retried. The transaction package provides some facilities that help
+automate this process.
 
+Transaction managers and the transaction package itself can be used
+with the Python with statement, as in::
 
+    with transaction:
+        author2.name = 'John Tolkien'
 
+The suite inside the with statement is executed. If there is no error,
+the transaction is committed. If there is an exception, the
+transaction is automatically aborted.
 
+Transaction manager and the transaction package also provide an
+``attempts`` method.  The attempts method helps with handling
+transient errors, like conflict errors.  It returns an iterator that
+provides up to a given number of attempts to perform a transaction::
 
+    for attempt in transaction.attemps(5):
+        with attempt:
+            author2.name = 'John Tolkien'
 
+The example above trieds up to 5 times to set the author name.  If
+there are non-transient errors, the the loops exits with the error
+raised.  If the number of attempts is exhaused, the loops exits with
+the transient error raised.
 
+Memory management
+=================
 
+One of the advantages of using a database is that you can deal with
+more data than will fit in memory.  ZODB takes care of loading objects
+into memory when needed and removing them from memory when no-longer
+needed.  More precisely, ZODB moves persistent objects in and out of
+memory.  The persistent object is the unit of storage.
 
+In the book example, Each book and author is in it's own database
+record.  Because author's books are managed in BTrees, which are
+persistent objects, the book collections aren't contained in the
+author database records. Rather the author records contain persistent
+references to the BTree records.
 
+As they grow, BTrees spread their data over multiple persistent
+subobjects.  This makes BTrees highly scalable.  You can have BTrees
+that store many millions of items and load onkly a small fraction of
+the BTree into memory to look an item up.
 
+Each ZODB connection has an in-memory cache. You can set the size of
+this cache as a number of objects, a number of bytes, or both.  These
+sizes aren't limits.  .... explain
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 .. [#c] Implementing objects in C requires a lot more care. It's
    really hard. :) See "Implementing persistent objects in C" for more
    details.



More information about the checkins mailing list