[ZODB-Dev] RFC: Attributes and Options for IndexedCatalog

Christian Reis kiko@async.com.br
Tue, 28 Jan 2003 17:25:10 -0200


I'd appreciate comments on the following proposal for field
specifications for IC. To make things simple, you can simply vote for
solutions A, B or C (see section 4 for a skim) - you can even do so
without reading <wink>.

1. Problem

Currently, IC specifies attribute options using "_ic_*" configuration
attributes. These attributes have been added ad-hoc and are a tad
confusing nowadays, since finding out what the default semantics are and
how you can change them is not trivial. All in all we feel it has become
a bit messy.

What we need is a flexible, yet clean way of specifying class attributes
and controlling options for each of them.

2. Class attributes

For indexing to make some sense, the objects that are included in the
catalog should have at least some similarity. Up to now we relied on the
fact that each Catalog was tied to a class; however, this has been
recently discussed [1] (this discussion is out of scope here however). 

Specifying fields for the catalog is currently done by using class
attributes, and options by using _ic_* attributes. A short example
follows:

    class Host(IndexedObject):
        name = StringType
        address = TupleType
        arch = StringType
        mhz = IntType
        os = OpSys # a class
        daemons = HostDaemonCollection
        _ic_unique = ('name', 'address')
        _ic_exclude = ('mhz',) # don't index MHz
        _ic_weak = ('daemons',)

In this example, we have four attributes with basic types (name,
address, arch, mhz), one foreign reference (os), and one subobject
reference (daemons). Subobjects are objects that have lifecycles tightly
coupled to the main instance (which would be a 'composite' object in
certain terminologies).

3. Extending class attributes

Apart from the basic attributes, we have in the last example 3
customizations. One indicates that name and address should be unique
(i.e., should raise UniqueError when setattr()ed/indexed), another
indicates that we should not index mhz, and the last indicates that
the daemons attribute points to a subobject (and that we should create a
new HostDaemonCollection upon startup, therefore). There could be other
uses for this, as discussed recently [2], and we can forsee at least the
following:

    - Uniqueness
    - Weakness
    - Exclude from Index
    - FloatsAreInts [3]
    - TimeStamps [4]
    - Default value
    - Autoincrement (yes, just like SQL auto incrementing fields)

4. Proposal for alternate formats

We have a number of options to make things slightly more consistent. 

    a. Deprecate the use of class attributes, and use _ic_fields to
    define how fields should be. There can be two approaches here: use a
    plain Python dictionary, and use a helper function. An example of
    dicts follows:

        class Host(IndexedObject):
            _ic_fields = [{'name': 'name', 'type': StringType, 'unique': 1},
                          {'name': 'address', 'type': StringType, 'unique':1},
                          {'name': 'arch', 'type': StringType}, 
                          {'name': 'mhz', 'type': IntType},
                          {'name': 'os', 'type': OpSys},
                          {'name': 'daemons', 'type': HostDaemonCollection, 
                           'weak': 1}]
            def __init__(self):
                IndexedObject.__init__(self)

    Ugly, huh? Well, we could use a nice helper:

        class Host(IndexedObject):
            _ic_fields = [ attr('name', StringType, unique=1),
                           attr('address', StringType, unique=1),
                           attr('arch', StringType),
                           attr('mhz', IntType),
                           attr('os', OpSys),
                           attr('daemons', HostDaemonCollection, weak=1) ]

    which is slightly nicer. The problem with this approach is that it
    removes all "naturality" from object definition; you need to use our
    special format, which makes retrofitting classes harder.

    b. We could merge all existing _ic_* options into a single
    _ic_options field. I'm in favor of this because I consider it to be
    exceptionally clean:

        class Host:
            name = StringType
            address = TupleType
            arch = StringType
            mhz = IntType
            os = OpSys # a class
            daemons = HostDaemonCollection
            _ic_options = [opt('name', unique=1),
                           opt('address', unique=1),
                           opt('daemons', weak=1)]

    c. We could kick the bucket and go for specifying a schema
    externally to the class, perhaps using XML. The drawbacks here are
    that information ends up being split up between the schema file and
    the domain object, it's very un-Pythonic, and complicates further
    migrating from a non-IC applications.

    Johan says something about "generating domain class code from XML"
    next to me and I feel worried.

5. Comments?

    self.feedback(appreciated=1)

[1] http://www.async.com.br/pipermail/indexedcatalog/2003-January/000036.html
[2] http://www.async.com.br/pipermail/indexedcatalog/2003-January/000041.html
[3] http://www.async.com.br/pipermail/indexedcatalog/2003-January/000037.html
[4] http://bugs.async.com.br/show_bug.cgi?id=526

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL