[Checkins] SVN: zope2docs/trunk/ZODB2.rst added

Andreas Jung andreas at andreas-jung.com
Sat Feb 21 03:16:49 EST 2009


Log message for revision 96863:
  added
  

Changed:
  A   zope2docs/trunk/ZODB2.rst

-=-
Added: zope2docs/trunk/ZODB2.rst
===================================================================
--- zope2docs/trunk/ZODB2.rst	                        (rev 0)
+++ zope2docs/trunk/ZODB2.rst	2009-02-21 08:16:49 UTC (rev 96863)
@@ -0,0 +1,558 @@
+Advanced ZODB for Python Programmers
+####################################
+
+In the first article in this series, `ZODB for Python
+Programmers <ZODB1>`_ I covered some of the simpler aspects of Python
+object persistence.  In this article, I'll go over some of the more
+advanced features of ZODB.
+
+
+In addition to simple persistence, ZODB offers some very useful
+extras for the advanced Python application.  Specificly, we'll cover
+the following advanced features in this article:.. comment:: description list
+
+Persistent-Aware TypesZODB comes with some special,
+"persistent-aware" data types for storing data in a ZODB.  The
+most useful of these is the "BTree", which is a fast, efficient
+storage object for lots of data.
+
+Voalitile DataNot all your data is meant to be stored in the
+database, ZODB let's you have volatile data on your objects that
+does not get saved.
+
+Pluggable StoragesZODB offers you the ability to use many
+different storage back-ends to store your object data, including
+files, relational databases and a special client-server storage
+that stores objects on a remote server.
+
+Conflict ResolutionWhen many threads try to write to the same
+object at the same time, you can get conflicts.  ZODB offers a
+conflict resolution protocol that allows you to mitigate most
+conflicting writes to your data.
+
+TransactionsWhen you want your changes to be "all or nothing"
+transactions come to the rescue.  
+
+
+
+
+Persistent-Aware Types
+======================
+
+You can also get around the mutable attribute problem discussed in
+the first article by using special types that are "persistent
+aware".  ZODB comes with the following persistent aware mutable
+object types:.. comment:: description list
+
+PersistentListThis type works just like a list, except that
+changing it does not require setting _p_changed or explicitly
+re-assigning the attribute.
+
+PersistentMappingA persistent aware dictionary, much like
+PersistentList.
+
+BTreeA dictionary-like object that can hold large
+collections of objects in an ordered, fast, efficient way.
+
+
+
+
+BTrees offer a very powerful facility to the
+Python programmer:.. comment:: bullet list
+
+- BTrees can hold a large collection of information in an
+efficient way; more objects than your computer has enough
+memory to hold at one time.  
+- BTrees are integrated into the persistence machinery to work
+effectively with ZODB's object cache.  Recently, or heavily
+used objects are kept in a memory cache for speed.
+- BTrees can be searched very quickly, because they are stored
+in an fast, balanced tree data structure.
+
+
+
+BTrees come in three flavors, OOBTrees, IOBTrees, OIBTrees, and
+IIBTrees.  The last three are optimized for integer keys, values,
+and key-value pairs, respectively.  This means that, for example,
+an IOBTree is meant to map an integer to an object, and is
+optimized for having integers keys.
+
+
+Using BTrees
+============
+
+Suppose you track the movement of all your employees with
+heat-seeking cameras hidden in the ceiling tiles.  Since your
+employees tend to frequently congregate against you, all of the
+tracking information could end up to be a lot of data, possibly
+thousands of coordinates per day per employee.  Further, you want
+to key the coordinate on the time that it was taken, so that you
+can only look at where your employees were during certain times:::
+
+from BTrees import IOBTree
+from time import time
+
+class Employee(Persistent):
+
+def __init__(self):
+self.movements = IOBTree()
+
+def fix(self, coords):
+"get a fix on the employee"
+self.movements[int(time())] = coords
+
+def trackToday(self): 
+"return all the movements of the
+employee in the last 24 hours"
+current_time = int(time())
+return self.movements.items(current_time - 86400, 
+current_time)
+
+
+
+In this example, the :: 
+
+fix
+
+method is called every time one of your
+cameras sees that employee.  This information is then stored in a
+BTree, with the current :: 
+
+time()
+
+as the key and the :: 
+
+coordinates
+
+
+as the value.
+
+
+Because BTrees store their information is a ordered structure,
+they can be quickly searched for a range of key values.  The
+:: 
+
+trackToday
+
+method uses this feature to return a sequence of
+coordinates from 24 hours hence to the present.
+
+
+This example shows how BTrees can be quickly searched for a range
+of values from a minimum to a maximum, and how you can use this
+technique to oppress your workforce.  BTrees have a very rich API,
+including doing unions and intersections of result sets.
+
+
+Not All Objects are Persistent
+==============================
+
+You don't have to make all of your objects persistent.
+Non-persistent objects are often useful to represent either
+"canned" behavior (classes that define methods but no state), or
+objects that are useful only as a "cache" that can be thrown away
+when your persistent object is deactivated (removed from memory
+when not used).
+
+
+ZODB provides you with the ability to have *volatile*> attributes.
+Volatile attributes are attributes of persistent objects that are
+never saved in the database, even if they are capable of being
+persistent.  Volatile attributes begin with :: 
+
+_v_
+
+are good for
+keeping cached information around for optimization.  ZODB also
+provides you with access to special pickling hooks that allow you
+to set volatile information when an object is activated.
+
+
+Imagine you had a class that stored a complex image that you
+needed to calculate.  This calculation is expensive.  Instead of
+calculating the image every time you called a method, it would be
+better to calculate it *once*> and then cache the result in a
+volatile attribute:::
+
+def image(self):
+"a large and complex image of the terrain"
+if hasattr(self, '_v_image'):
+return self._v_image
+image=expensive_calculation()
+self._v_image=image
+return image
+
+
+
+Here, calling :: 
+
+image
+
+the first time the object is activated will
+cause the method to do the expensive calculation.  After the first
+call, the image will be cached in a volatile attribute.  If the
+object is removed from memory, the :: 
+
+_v_image
+
+attribute is not
+saved, so the cached image is thrown away, only to be recalculated
+the next time you call :: 
+
+image
+
+.
+
+
+ZODB and Concurrency
+====================
+
+Different, threads, processes, and computers on a network can open
+connections to a single ZODB object database.  Each of these
+different processes keeps its own copy of the objects that it uses
+in memory.
+
+
+The problem with allowing concurrent access is that conflicts can
+occur.  If different threads try to commit changes to the same
+objects at the same time, one of the threads will raise a
+ConflictError.  If you want, you can write your application to
+either resolve or retry conflicts a reasonable number of times.
+
+
+Zope will retry a conflicting ZODB operation three times.  This is
+usually pretty reasonable behavior.  Because conflicts only happen
+when two threads write to the same object, retrying a conflict
+means that one thread will win the conflict and write itself, and
+the other thread will retry a few seconds later.
+
+
+Pluggable Storages
+==================
+
+Different processes and computers can connection to the same
+database using a special kind of storage called a :: 
+
+ClientStorage
+
+.
+A :: 
+
+ClientStorage
+
+connects to a :: 
+
+StorageServer
+
+over a network.
+
+
+In the very beginning, you created a connection to the database by
+first creating a storage.  This was of the type :: 
+
+FileStorage
+
+.
+Zope comes with several different back end storage objects, but
+one of the most interesting is the :: 
+
+ClientStorage
+
+from the Zope
+Enterprise Objects product (ZEO).
+
+
+The :: 
+
+ClientStorage
+
+storage makes a TCP/IP connection to a
+:: 
+
+StorageServer
+
+(also provided with ZEO).  This allows many
+different processes on one or machines to work with the same
+object database and, hence, the same objects.  Each process gets a
+cached "copy" of a particular object for speed.  All of the
+:: 
+
+ClientStorages
+
+connected to a :: 
+
+StorageServer
+
+speak a special
+object transport and cache invalidation protocol to keep all of
+your computers synchronized.
+
+
+Opening a :: 
+
+ClientStorage
+
+connection is simple.  The following
+code creates a database connection and gets the root object for a
+:: 
+
+StorageServer
+
+listening on "localhost:12345":::
+
+from ZODB import DB
+from ZEO import ClientStorage
+storage = ClientStorage.ClientStorage('localhost', 12345)
+db = DB( storage )
+connection = db.open()
+root = connection.root()
+
+
+
+In the rare event that two processes (or threads) modify the same
+object at the same time, ZODB provides you with the ability to
+retry or resolve these conflicts yourself. 
+
+
+Resolving Conflicts
+===================
+
+If a conflict happens, you have two choices. The first choice is
+that you live with the error and you try again.  Statistically,
+conflicts are going to happen, but only in situations where objects
+are "hot-spots".  Most problems like this can be "designed away";
+if you can redesign your application so that the changes get
+spread around to many different objects then you can usually get
+rid of the hot spot.
+
+
+Your second choice is to try and *resolve*> the conflict. In many
+situations, this can be done. For example, consider the following
+persistent object:::
+
+class Counter(Persistent):
+
+self.count = 0
+
+def hit(self):
+self.count = self.count + 1
+
+
+
+This is a simple counter.  If you hit this counter with a lot of
+requests though, it will cause conflict errors as different threads
+try to change the count attribute simultaneously.
+
+
+But resolving the conflict between conflicting threads in this
+case is easy.  Both threads want to increment the self.count
+attribute by a value, so the resolution is to increment the
+attribute by the sum of the two values and make both commits
+happy.
+
+
+To resolve a conflict, a class should define an
+:: 
+
+_p_resolveConflict
+
+method. This method takes three arguments... comment:: description list
+
+:: 
+
+oldState
+
+The state of the object that the changes made by
+the current transaction were based on. The method is permitted
+to modify this value.
+
+:: 
+
+savedState
+
+The state of the object that is currently
+stored in the database. This state was written after 
+:: 
+
+
+oldState
+
+
+
+
+and reflects changes made by a transaction that committed
+before the current transaction. The method is permitted to
+modify this value.
+
+:: 
+
+newState
+
+The state after changes made by the current
+transaction.  The method is 
+*
+not
+*>
+permitted to modify this
+value. This method should compute a new state by merging
+changes reflected in 
+:: 
+
+
+savedState
+
+
+
+and 
+:: 
+
+
+newState
+
+
+
+, relative to
+
+:: 
+
+
+oldState
+
+
+
+.
+
+
+
+
+The method should return the state of the object after resolving
+the differences.  
+
+
+Here is an example of a :: 
+
+_p_resolveConflict
+
+in the :: 
+
+Counter
+
+
+class:::
+
+class Counter(Persistent):
+
+self.count = 0
+
+def hit(self):
+self.count = self.count + 1
+
+def _p_resolveConflict(self, oldState, savedState, newState):
+
+# Figure out how each state is different:
+savedDiff= savedState['count'] - oldState['count']
+newDiff= newState['count']- oldState['count']
+
+# Apply both sets of changes to old state:
+return oldState['count'] + savedDiff + newDiff
+
+
+
+In the above example, :: 
+
+_p_resolveConflict
+
+resolves the difference
+between the two conflicting transactions.
+
+
+Transactions and Subtransactions
+================================
+
+Transactions are a very powerful concept in databases.
+Transactions let you make many changes to your information as if
+they were all one big change.  Imagine software that did online
+banking and allowed you to transfer money from one account to
+another.  You would do this by deducting the amount of the
+transfer from one account, and adding  that amount onto the
+other.  
+
+
+If an error happened while you were adding the money to the
+receiving account (say, the bank's computers were unavailable),
+then you would want to abort the transaction so that the state of
+the accounts went back to the way they were before you changed
+anything.
+
+
+To abort a transaction, you need to call the :: 
+
+abort
+
+method of the
+transactions object:::
+
+get_transaction().abort()
+
+
+
+This will throw away all the currently changed objects and start a
+new, empty transaction.
+
+
+Subtransactions, sometimes called "inner transactions", are
+transactions that happen inside another transaction.
+Subtransactions can be commited and aborted like regular "outer"
+transactions.  Subtransactions mostly provide you with an
+optimization technique.
+
+
+Subtransactions can be commited and aborted.  Commiting or
+aborting a subtransaction does not commit or abort its outer
+transaction, just the subtransaction.  This lets you use many,
+fine-grained transactions within one big transaction.
+
+
+Why is this important?  Well, in order for a transaction to be
+"rolled back" the changes in the transaction must be stored in
+memory until commit time.  By commiting a subtransaction, you are
+telling Zope that "I'm pretty sure what I've done so far is
+permenant, you can store this subtransaction somewhere other than
+in memory".  For very, very large transactions, this can be a big
+memory win for you.
+
+
+If you abort an outer transaction, then all of its inner
+subtransactions will also be aborted and not saved.  If you abort
+an inner subtransaction, then only the changes made during that
+subtransaction are aborted, and the outer transaction is *not*>
+aborted and more changes can be made and commited, including more
+subtransactions.
+
+
+You can commit or abort a subtransaction by calling either
+commit() or abort() with an argument of 1:::
+
+get_transaction().commit(1) # or
+get_transaction().abort(1)
+
+
+
+Subtransactions offer you a nice way to "batch" all of your "all
+or none" actions into smaller "all or none" actions while still
+keeping the outer level "all or none" transaction intact.  As a
+bonus, they also give you much better memory resource performance.
+
+
+Conclusion
+==========
+
+ZODB offers many advanced features to help you develop simple, but
+powerful python programs.  In this article, you used some of the
+more advanced features of ZODB to handle different application
+needs, like storing information in large sets, using the database
+concurrently, and maintaining transactional integrity.  For more
+information on ZODB, join the discussion list at zodb-dev at zope.org
+where you can find out more about this powerful component of Zope.
+
+



More information about the Checkins mailing list