[ZODB-Dev] Re: [Dev] ZODB is not a Storage Technology (Re: other formats )

Mike C. Fletcher mcfletch@rogers.com
Sat, 09 Nov 2002 03:42:35 -0500


Okay, here's a quick overview of the guts, presented as an outline. I've 
assumed you'll be reading the summaries with the source-code open in 
another window to see what's being described, so I've not gone into any 
details as to how anything is done.

The objects likely best to concentrate on for understanding the 
low-level guts are the FileStorage, the Connection, and the 
_defaulttransaction.  I've given you quick summaries of what you'll find 
in most of the files in the ZODB4 CVS packages (ZODB, Transaction and 
Persistence), the zLOG project is just logging facilities, nothing 
really close to the core of the ZODB.  The indentation is primarily 
showing usage patterns (for instance, fsindex is really only used by 
FileStorage AFAIK), though I've also used it to group items which can be 
considered sub-categories of the superior item.

I'll work on details tomorrow if I can get some more time, 
questions/directions in which you'd like more coverage quite welcome. 
 BTW: I've copied the ZODB-dev list so that others can correct anything 
I've messed up, or add anything that they consider critical to 
understanding the system.

Enjoy,
Mike

ZODB:
    Storage (BaseStorage sub-classes):
        """Storages are responsible for maintaining object state records

        They can also maintain undo (transaction) and versional records.
        """
        FileStorage:
            """Default ZODB storage

            The FileStorage is a linear aggregate of all transactions,
            and transactions are aggregates of all changed objects.
           
            Transactions are added at the end of the file, with
            later changes to a particular object conceptually overwriting
            the earlier changes.

            Versions (personal views of the dbase) are just transactions
            which are declared to have version information.  The versions
            form linked lists (they point to the last transaction in the
            version).

            Storages which have undo support (such as filestorage) have
            a pack method which basically copies all objects forward until
            there is a single current set.  Then discards anything not in
            the current set.
            """
            fsIndex:
                """Index from persistent OID -> file position index
               
                The fsIndex provides optimised index to individual objects
                within the data file of the FileStorage.  The index can
                be rebuilt merely be scanning through the entire datafile.
                """
            TmpStore:
                """Storage for transaction save-points"""
        DBMStorage:
            """Simple storage based on GDBM/AnyDBM"""
        MappingStorage:
            """A demonstration of a volatile in-memory storage"""

        utility mechanisms:
            TimeStamp:
                """TimeStamp C exetension type"""
            Serialize:
                """Pickle-like storage (cPickle plus some custom code)"""
                referencesf:
                    """finds object refs in pickle strings"""
            file_lock:
                """(small) wrapper to do cross-platform locking of files"""
            fsdump, fsrecover:
                """Debugging/utility code"""

    Connection:
        """Object-space in which application objects live

        Uses an in-memory object-cache (see below)

        Provides object-access (get root dict, get object by oid)
        though normal access is via getting root and then
        drilling down through the object references.

        Other than this, almost the entire class is support
        for the transaction and persistence mechanisms.
        """
        ExportImport:
            """Mix-in providing XML import/export"""
    DB:
        """Manages multiple Connections to a storage

        Provides a pool of connections
       
            Provides mechanisms for applying functions
            to all object caches in all connections
       
        Tracks object modifications for versions? (not
        sure about this, I've never used versions)

        Provides most of the primitives on which Connection and
        Transaction build the transaction mechanism.  (tpc_*)
        """


Transaction:
    _defaultTransaction:
        """The default transaction machinery

        Combined with the connection object, this is most
        of the transaction-driving code in the system.  It
        is fairly tightly coupled to the Persistent module
        (e.g. it assumes _p_jar and the like on all registered
        objects).
        """
    Transaction:
        """Data-storage for the current transaction"""
    Manager:
        """Entry point for transaction APIs"""

Persistence:
    _persistent:
        """Python 2.2.2 implementation of IPersistent

        Basically, this is a Pure-python version of the cPersistence
        code that really gets used (I'm not sure if there's code
        anywhere to fall back to using this version if the cPersistence
        code isn't compiled).

        This is quite useful for figuring out what's going on,
        but (having used it for a few months), it seemed too slow
        to be of use in a real-world system (too much time spent in
        __getattribute__).
        """
    cPersistence:
        """Provides optimised IPersistent implementation"""

    Cache:
        """Provides an in-memory object cache to reduce reloads from disk

        Basically this is a high-level cache, it has a target size
        and a few methods implementing garbage collection.  The
        DB calls the connection's GC methods, then the connection calls
        it's cache's GC methods.
        """

    particular data-types:
        PersistentDict, PersistentList:
            """Dictionary and List types which track their changes

            Basically allow you to use them as lists/dicts without
            needing to spend code tracking changes yourself.  These
            items, however, re-store the entire list/dict on each
            save, so see BTree for large dicts.
            """
        BTrees:
            """BTree implementation using individually persistent nodes

            Allows large dictionaries to be stored so that only a small
            sub-set of the dictionary needs to be re-stored on modifications
            """
        Function, Module, Package:
            """References to these types w/ importing

            Never used these myself (I think they're new),
            they appear to store name-references, or actual
            code objects in the case of functions.
            """



John Anderson wrote:

> I'd be interested in an overview of the guts. Start with a big 
> picture, then move into some details and describe what's in which 
> files. I'd like to eventually learn the code base so I can decide how 
> to improve it.
>
> John
>
> Mike C. Fletcher wrote:
>
>> At what level would you like the description (I've been using ZODB 
>> for years now, and have just released a calendaring application on 
>> it).  I assume you understand the basics, so are you looking for 
>> analysis of where/how it starts to fail/how to update it, or what the 
>> actual machinery inside is doing for any given action?
>>
>> I'll push some time around and try to get a description posted this 
>> weekend if you can tell me which area you need.
>>
>> Enjoy,
>> Mike
>>
...