[ZODB-Dev] Re: [Dev] ZODB is not a Storage Technology (Re: other formats )

John Anderson john@osafoundation.org
Sat, 09 Nov 2002 09:09:49 -0800


Thanks for the very nice overview. Makes lots of sense and it will help 
me as we jump into the code. I did have one question, see below

John

Mike C. Fletcher wrote:

> Okay, here's a quick overview of the guts, presented as an outline. 
> I've assumed you'll be reading the summaries with the source-code open 
> in another window to see what's being described, so I've not gone into 
> any details as to how anything is done.
>
> The objects likely best to concentrate on for understanding the 
> low-level guts are the FileStorage, the Connection, and the 
> _defaulttransaction.  I've given you quick summaries of what you'll 
> find in most of the files in the ZODB4 CVS packages (ZODB, Transaction 
> and Persistence), the zLOG project is just logging facilities, nothing 
> really close to the core of the ZODB.  The indentation is primarily 
> showing usage patterns (for instance, fsindex is really only used by 
> FileStorage AFAIK), though I've also used it to group items which can 
> be considered sub-categories of the superior item.
>
> I'll work on details tomorrow if I can get some more time, 
> questions/directions in which you'd like more coverage quite welcome. 
> BTW: I've copied the ZODB-dev list so that others can correct anything 
> I've messed up, or add anything that they consider critical to 
> understanding the system.
>
> Enjoy,
> Mike
>
> ZODB:
>    Storage (BaseStorage sub-classes):
>        """Storages are responsible for maintaining object state records
>
>        They can also maintain undo (transaction) and versional records.
>        """
>        FileStorage:
>            """Default ZODB storage
>
>            The FileStorage is a linear aggregate of all transactions,
>            and transactions are aggregates of all changed objects.
>                      Transactions are added at the end of the file, with
>            later changes to a particular object conceptually overwriting
>            the earlier changes.
>
>            Versions (personal views of the dbase) are just transactions
>            which are declared to have version information.  The versions
>            form linked lists (they point to the last transaction in the
>            version).
>
>            Storages which have undo support (such as filestorage) have
>            a pack method which basically copies all objects forward until
>            there is a single current set.  Then discards anything not in
>            the current set.

Does it copy "in place" so that if you pulled the plug while in pack 
your file is corrupted?

>
>            """
>            fsIndex:
>                """Index from persistent OID -> file position index
>                              The fsIndex provides optimised index to 
> individual objects
>                within the data file of the FileStorage.  The index can
>                be rebuilt merely be scanning through the entire datafile.
>                """
>            TmpStore:
>                """Storage for transaction save-points"""
>        DBMStorage:
>            """Simple storage based on GDBM/AnyDBM"""
>        MappingStorage:
>            """A demonstration of a volatile in-memory storage"""
>
>        utility mechanisms:
>            TimeStamp:
>                """TimeStamp C exetension type"""
>            Serialize:
>                """Pickle-like storage (cPickle plus some custom code)"""
>                referencesf:
>                    """finds object refs in pickle strings"""
>            file_lock:
>                """(small) wrapper to do cross-platform locking of 
> files"""
>            fsdump, fsrecover:
>                """Debugging/utility code"""
>
>    Connection:
>        """Object-space in which application objects live
>
>        Uses an in-memory object-cache (see below)
>
>        Provides object-access (get root dict, get object by oid)
>        though normal access is via getting root and then
>        drilling down through the object references.
>
>        Other than this, almost the entire class is support
>        for the transaction and persistence mechanisms.
>        """
>        ExportImport:
>            """Mix-in providing XML import/export"""
>    DB:
>        """Manages multiple Connections to a storage
>
>        Provides a pool of connections
>                  Provides mechanisms for applying functions
>            to all object caches in all connections
>              Tracks object modifications for versions? (not
>        sure about this, I've never used versions)
>
>        Provides most of the primitives on which Connection and
>        Transaction build the transaction mechanism.  (tpc_*)
>        """
>
>
> Transaction:
>    _defaultTransaction:
>        """The default transaction machinery
>
>        Combined with the connection object, this is most
>        of the transaction-driving code in the system.  It
>        is fairly tightly coupled to the Persistent module
>        (e.g. it assumes _p_jar and the like on all registered
>        objects).
>        """
>    Transaction:
>        """Data-storage for the current transaction"""
>    Manager:
>        """Entry point for transaction APIs"""
>
> Persistence:
>    _persistent:
>        """Python 2.2.2 implementation of IPersistent
>
>        Basically, this is a Pure-python version of the cPersistence
>        code that really gets used (I'm not sure if there's code
>        anywhere to fall back to using this version if the cPersistence
>        code isn't compiled).
>
>        This is quite useful for figuring out what's going on,
>        but (having used it for a few months), it seemed too slow
>        to be of use in a real-world system (too much time spent in
>        __getattribute__).
>        """
>    cPersistence:
>        """Provides optimised IPersistent implementation"""
>
>    Cache:
>        """Provides an in-memory object cache to reduce reloads from disk
>
>        Basically this is a high-level cache, it has a target size
>        and a few methods implementing garbage collection.  The
>        DB calls the connection's GC methods, then the connection calls
>        it's cache's GC methods.
>        """
>
>    particular data-types:
>        PersistentDict, PersistentList:
>            """Dictionary and List types which track their changes
>
>            Basically allow you to use them as lists/dicts without
>            needing to spend code tracking changes yourself.  These
>            items, however, re-store the entire list/dict on each
>            save, so see BTree for large dicts.
>            """
>        BTrees:
>            """BTree implementation using individually persistent nodes
>
>            Allows large dictionaries to be stored so that only a small
>            sub-set of the dictionary needs to be re-stored on 
> modifications
>            """
>        Function, Module, Package:
>            """References to these types w/ importing
>
>            Never used these myself (I think they're new),
>            they appear to store name-references, or actual
>            code objects in the case of functions.
>            """
>
>
>
> John Anderson wrote:
>
>> I'd be interested in an overview of the guts. Start with a big 
>> picture, then move into some details and describe what's in which 
>> files. I'd like to eventually learn the code base so I can decide how 
>> to improve it.
>>
>> John
>>
>> Mike C. Fletcher wrote:
>>
>>> At what level would you like the description (I've been using ZODB 
>>> for years now, and have just released a calendaring application on 
>>> it).  I assume you understand the basics, so are you looking for 
>>> analysis of where/how it starts to fail/how to update it, or what 
>>> the actual machinery inside is doing for any given action?
>>>
>>> I'll push some time around and try to get a description posted this 
>>> weekend if you can tell me which area you need.
>>>
>>> Enjoy,
>>> Mike
>>>
> ...
>