[Checkins] SVN: Sandbox/J1m/zodb-doc/intro.txt *** empty log message ***
Jim Fulton
jim at zope.com
Tue Jun 1 06:14:37 EDT 2010
Log message for revision 112884:
*** empty log message ***
Changed:
U Sandbox/J1m/zodb-doc/intro.txt
-=-
Modified: Sandbox/J1m/zodb-doc/intro.txt
===================================================================
--- Sandbox/J1m/zodb-doc/intro.txt 2010-06-01 09:53:30 UTC (rev 112883)
+++ Sandbox/J1m/zodb-doc/intro.txt 2010-06-01 10:14:37 UTC (rev 112884)
@@ -2,10 +2,15 @@
Introducing ZODB
================
-The ZODB provides an object-persistence facility for Python. It
-provides many features, which we'll get into later, but first, let's
-take a very quick look at the basics.
+ZODB provides an object-persistence facility for Python. It provides
+many features, which we'll get into later, but first, let's take a
+very quick look at the basics.
+.. contents::
+
+Getting started
+===============
+
We start by creating a file-based database:
>>> import ZODB
@@ -15,7 +20,7 @@
Because the database didn't already exist, it's created automatically.
Initially, the database contains a single object, the root object.
-Connections have a root method that retrieves the root object:
+Connections have a root attribute that retrieves the root object [#root_]:
>>> conn.root
<root >
@@ -133,9 +138,9 @@
simply set an object attribute as we normally would without a
database. In fact, almost all of the operations we've performed were
accomplished through normal object operations. The only exception has
-been the use of transaction comit and abort calls to indicate that changes
-made should be saved or discarded. We didn't had to keep track of
-the changes made. The ZODB did that for us.
+been the use of transaction ``commit`` and ``abort`` calls to indicate
+that changes made should be saved or discarded. We didn't have to keep
+track of the changes made. The ZODB did that for us.
Persistence
===========
@@ -184,12 +189,12 @@
- Use persistent sub-objects.
- This is the approach taken by the ``Author`` class shown earlier.
- When we use a persistent subobject, the containing object isn't
- responsible for managing the persistence of subobject changes; the
- subobject is responsible. In the (non-broken) ``Author`` class,
- adding a book doesn't change the author, it changes the author's
- books.
+ This is the recomended approach, and the approach taken by the
+ ``Author`` class shown earlier. When we use a persistent subobject,
+ the containing object isn't responsible for managing the persistence
+ of subobject changes; the subobject is responsible. In the
+ (non-broken) ``Author`` class, adding a book doesn't change the
+ author, it changes the author's books.
- Tell ZODB about changes explicitly.
@@ -206,16 +211,21 @@
def new_book(self, title):
book = Book(title)
book.author = self
+ self._p_changed = True
self.books[title] = book
- self._p_changed = True
return book
Here we assigned ``_p_changed`` attribute to signal that the author
object has changed.
+ Note that we assigned the _p_changed attribute *before* we changed
+ the books dictionary [#whychangedbefore]_. This is subtle and one
+ of the reasons we don't recommend using non-persistent mutable
+ subobjects.
+
Finally, ZODB needs to keep track of certain meta data for persistent
objects. The ``Persistent`` base class takes care of this too. The
-standard meta data includes::
+standard meta data includes:
``_p_oid``
Every persistent object that has been stored in a database as a
@@ -290,7 +300,8 @@
a few standard storages.
If we want to open multiple connections, we can use a slightly
-lower-level API, by passing a database file name to ``ZODB.DB``:
+lower-level API to create a database object by passing a database file
+name to ``ZODB.DB``:
>>> db = ZODB.DB('data.fs')
@@ -317,14 +328,14 @@
your storage server. The storage server uses some underlying
storage, such as a file storage to store data.
- To use a ZEO client storage with the high-level APIs, just pass
- the ZEO server address as a host and port tuple or as an integer
- port on the local host::
+ The high-level API doesn't support ZEO yet, but ZEO provides its
+ own high-level API. To use a ZEO client storage with the ZEO
+ high-level APIs, just pass the ZEO server address as a host and
+ port tuple or as an integer port on the local host::
- >>> db = ZODB.DB(('storage.example.com', 8100))
- >>> connection = ZODB.connection(8100) # localhost
+ >>> db = ZEO.DB(('storage.example.com', 8100))
+ >>> connection = ZEO.connection(8100) # localhost
-
MappingStorage
Mapping storages store database records in memory. Mapping
storages are typically used for testing or experimenting with
@@ -342,8 +353,8 @@
Demo storages allow you to take an unchanging base storage and
store changes in a separate changes storage. They were originally
implemented to allow demonstrations of applications in which a
- populated sample database was provided in CD and users could make
- changes that were stored in memory.
+ populated sample database was provided on Compact Disk and users
+ could make changes that were stored in memory.
Demo storages don't actually store anything themselves. They
delegate to 2 other storages, an unchanging base storage and a
@@ -368,6 +379,10 @@
it at a point in time, allowing it to be used as a base for a demo
storage.
+zc.zlibstorage
+ zc.zlibstoprage provides a wrapper storage to compress database
+ records.
+
zc.zrs and zeoraid
zc.zrs and zeoraid provide database replication. zc.zrs is a
commercial storage implementation, while zeoraid is open source.
@@ -413,7 +428,7 @@
threads in separate processes. This is accomplished using per-thread
database connections and transaction managers. Each thread operates
as if it has it's own copy of the database. Threads are synchonized
-through transactions
+through transaction commits.
In a typical application, each thread opens a separate connection to a
database. Each connection has it's own object cache. If multiple
@@ -523,7 +538,7 @@
>>> author2.name = 'John Tolkien'
>>> transaction_manager2.commit()
-Conflicts are troublesome for 2 reasons:
+Conflicts are annoying for 2 reasons:
- Application code has to anticipate conflicts and be prepared to
retry transactions.
@@ -577,7 +592,7 @@
the transaction is committed. If there is an exception, the
transaction is automatically aborted.
-Transaction manager and the transaction package also provide an
+Transaction managers and the transaction package also provide an
``attempts`` method. The attempts method helps with handling
transient errors, like conflict errors. It returns an iterator that
provides up to a given number of attempts to perform a transaction::
@@ -586,11 +601,76 @@
with attempt:
author2.name = 'John Tolkien'
-The example above trieds up to 5 times to set the author name. If
-there are non-transient errors, the the loops exits with the error
-raised. If the number of attempts is exhaused, the loops exits with
-the transient error raised.
+The example above tries up to 5 times to set the author name. If an
+attempt suceeds, the loop exits. If there are non-transient errors,
+the the loops exits by raising the error. If transient errors are
+raised, the attempts continue until an attempt suceeds, or until the
+number of attempts is exhaused, at which point the loops exits with by
+raising transient error.
+Object life cycle
+=================
+
+In-memory Persistent objects transition through a number of states, as
+shown in figure 1.
+
+.. image:: object-life-cycle.png
+
+Objects are created in the new state. At this point, they behave like
+normal Python objects.
+
+To add an object to a database, you add a reference to it from an
+object that's already in the database and commit the change. At that
+point the object is in the saved state.
+
+If you modify the object, it transitions to the changed state until
+the transaction ends.
+
+The ghost state is a state in which an object exists in memory but
+lacks any data. When an object hasn't been used in a while and ZODB
+needs to make room for new objects, it is deactivated and it's data is
+released. The object itself remains in memory as a ghost as long as it
+is refered to by other objects.
+
+When an object is no-longer referenced, it is removed from memory by
+the Python garbage collector. Because object data may include
+references to other persistent objects, releasing an object's data may
+cause referenced objects to become collected by the Python garbage
+collector and removed from memort.
+
+When an object's data are loaded, references to other object's in the
+object's data cause those objects to be loaded. When an object is
+loaded, it is in the ghost state until it is accessed, at which point
+it's data are loaded and it transitions to the saved state.
+
+If an object is in the saved state, it can be deactivated at any
+time. This is why care must be taken when tracking changes
+yourself. Consider another broken author implementation::
+
+ class SubtlyBrokenAuthor(persistent.Persistent):
+
+ def __init__(self, id, name):
+ self.id = id
+ self.name = name
+ self.books = {}
+
+ def new_book(self, title):
+ book = Book(title)
+ book.author = self
+ self.books[title] = book
+ self._p_changed = True
+ return book
+
+In this version of the author class, ``_p_changed`` is set after the
+books dictionary if modified. In theory [#practicemorecomplicated]_,
+the author object could be deactivated after the ``books`` dictionary
+is modified and before ``_p_changed`` is set, in which case, the
+change to the ``books`` would be lost.
+
+Remember that the object life cycle described here refers to a single
+instance of a database object. There can be multiple instances of a
+single database object in memory, in different states, at once.
+
Memory management
=================
@@ -607,40 +687,182 @@
references to the BTree records.
As they grow, BTrees spread their data over multiple persistent
-subobjects. This makes BTrees highly scalable. You can have BTrees
-that store many millions of items and load onkly a small fraction of
+sub-objects. This makes BTrees highly scalable. You can have BTrees
+that store many millions of items and load only a small fraction of
the BTree into memory to look an item up.
+It's important to choose data structures carefully to prevent
+individual persistent objects from becoming too large. A naive choice
+of object class can lead to a database in which all of the data
+resides in a single database object, im which case, the entire
+database must remain in memory.
+
Each ZODB connection has an in-memory cache. You can set the size of
-this cache as a number of objects, a number of bytes, or both. These
-sizes aren't limits. .... explain
+this cache as a number of objects, a number of bytes, or both. While
+these sizes are limits, they are only checked at certain times and, as
+a result, can be exceeded.
+Blobs: managing files in ZODB
+=============================
+ZODB was originally developed to support the Zope web application
+frameworks. Web applications often need to manage files of static
+data, such as images, movies, and other resource files. Especially
+for large media files, loading data into memory just to hand it off to
+a web client is counter productive. To address cases like these, ZODB
+provides blobs. "BLOB" is a term borrowed from other databases and
+is an acronyme for "binary large object". A better term might have
+been something like, "persistent file".
+ZODB blobs are persistent objects that can be opened to obtain Python file
+objects to access their data. Like other persistent objects,
+modifications to blobs are managed transactionally.
+Blobs are created using the ``ZODB.blob.Blob`` class
+[#nosubclassingBlob]. Let's update our book class to hold electronic
+versions of the book::
+ import persistent, BTrees.OOBTree, ZODB.blob
+ class Book(persistent.Persistent):
+ def __init__(self, title):
+ self.title = title
+ self.electronic = BTrees.OOBTree()
+ def add_electronic(self, format, data):
+ version = self.electronic.get(format)
+ if version is None:
+ version = self.electronic[format] = ZODB.blob.Blob()
+ version.open('w')
+ version.write(data)
+ version.close()
+ def get_electronic(self, format):
+ return self.electronic[format].open()
+Now books manage electronic binary versions. We add or update a
+version using the ``add_electronic`` method. It checks to see if
+there is already an alectronic version of a book and, if there isn't,
+it creates one using ``ZODB.blob.Blob()``.
+The ``add_electronic`` method then opens the blob, passing the write
+flag, ``'w'``. No file name is passed to the open method. The blob
+object itself identified the file to open. As with the built-in open
+function, the ``'w'`` flag causes any existing data to be
+overwritten. The open statement returns a file object that can be used
+with anything that expects a Python file object [#filesubclass]_. The
+``add_electronic`` method simply calls it's ``write`` method to write
+the data passed in and then closes it.
+The ``get_electronic`` method is used to access a binary version. It
+simply opens the blob for the given format and returns the resulting
+file. No mode is passed, so the default mode, ``'r'`` is used.
+Several modes are supported by the open statement:
+``'r'``
+ Open for reading. The file returned includes any changes made in
+ the current transaction.
+``'c'``
+ Open committed data for reading. The file returned does *not*
+ reflect any changes made in the current transaction.
+``'w'``
+ Open for writing. Existing data are overridden.
+``'a'``
+ Open for appending. Existing data are preserved and data are
+ written at the end of the file.
+``'r+'``
+ Open for writing without overwriting existing data. Depending on
+ file position, new writes may overwrite existing data.
+As mentioned at the beginning of this section, a motivation for blobs
+os to avoid loading large amounts of binary data into memory. Our
+``add_electronic`` implementation requires that the entire file
+content be passes as a string. A better implemention would copy data
+from a file in blocks::
+ def add_electronic(self, format, source_file):
+ version = self.electronic.get(format)
+ if version is None:
+ version = self.electronic[format] = ZODB.blob.Blob()
+ version.open('w')
+ while 1:
+ data = source_file.read(4096)
+ if not data:
+ break
+ version.write(data)
+ version.close()
+If the source file is a temporary file [#temporaryfiletobeconsumed]_,
+we can pass it's name to the blob ``consumeFile`` method::
+ def add_electronic(self, format, source_file_name):
+ version = self.electronic.get(format)
+ if version is None:
+ version = self.electronic[format] = ZODB.blob.Blob()
+ version.consumeFile(source_file_name)
+
+The advantage of ``consumeFile`` is that it can avoid copying the
+source file [#whencanitavoidcopying]_.
+
+To use blobs, you have to enable blobs when you configure your
+storage, generally by naming a blob directory::
+
+ >>> conn = ZODB.open('data.fs', blob_dir='data.blobs')
+
+database maintenamce
+====================
+
+Packing
+-------
+
+Garbage Collection
+------------------
+
+Multiple databases
+==================
+
+Indexing
+========
+
+Object-oriented versus relational designs
+=========================================
+
+time travel
+===========
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. [#root] The root attribute actually returns a convenience wrapper
+ around the root object. See "More on the root object".
+
.. [#c] Implementing objects in C requires a lot more care. It's
really hard. :) See "Implementing persistent objects in C" for more
details.
+.. [#whychangedbefore] We'll explain why this is important later,
+ when we talk about object lifecycles.
+
.. [#jar] The name ``_p_jar`` comes from early implementations of ZODB
in which databases were called "pickle jars", because objects were
stored using the Python pickle format. In those early versions,
@@ -649,9 +871,31 @@
.. [#itdidmore] It also arranged that when we closed the connection,
the underlying database was closed.
+.. [#exceptforblobs] If blobs are used, each blob is stored in a
+ separate file.
+
.. [#zconfig] ZODB uses the ``ZConfig`` configuration
system. Applications that use ``ZConfig`` can also merge the ZODB
configuration schemas with their own configuration schemas.
.. [#multipledbtags] You can define multiple databases, so there can
be multiple ``zodb`` tags. See "Using multiple databases."
+
+.. [#practicemorecomplicated] In fact, the object wouldn't be
+ deactivated unless some computation caused another object to be
+ loaded and maybe not even then, depending on other factors. Suffice
+ it to say that you don't want to have to think that hard.
+
+.. [#nosubclassingBlob] The Blob class can't be subclassed. To
+ associate behavior with blobs, use composition.
+
+.. [#filesubclass] The object returned is an instance of a subclass of
+ the standard Python file type.
+
+.. [#temporaryfiletobeconsumed] The file might be the result of a web
+ file upload. It must be named.
+
+.. [#whencanitavoidcopying] The blob will attempt to rename the file
+ to the blob directory. This is generally only possible when the
+ file is in the same disk partition as the blob directory.
+
More information about the checkins
mailing list