[Zope3-dev] Re: RFC: File-system synchronization proposal: ExternalFile?

Craeg K Strong cstrong@arielpartners.com
Wed, 28 Aug 2002 23:45:47 -0400


Hello:

Jim Fulton wrote:
> Craeg K Strong wrote:
> 
>> I do, however, believe that there will be overlap between the
>> pluggable persistence adaptor and the synchronization stuff.  Or
>> at least that you will want to use them together.
> 
> I don't think so, but I should explain a little bit why.
> The file-system representation takes a file-centric hierarchical view
> of the world.  The natural boundaries in this world are files and
> directories.
> 
> A persistent object system has a very different view of the world.
> Boundaries between objects may be much more fined-grained and may
> only have a loose connection with a file view.
> 
> For example, a large image is persistent as several persistent objects,
> but in a file-system representation, it should appear as one. Similarly, a
> folder may use a number of persistent objects in it's implementation that
> don't appear in the file-system representation.

I understand what you said, but I still think it is a distinction without
a difference.

We are talking about the impedence mismatch between a hierarchical
file system representation and a native Python object representation.

Objects can be persisted in several
different ways, for example, an Image could be stored as a single object
or broken up into multiple objects.  That's all handled by the persistence
adapter.  A persistence adaptor is designed to bridge an impedence
mismatch between two media.  Other than that, they can vary
radically based on the particular requirements they are designed to
fulfill.

The obvious example is the impedence mismatch between objects
and a relational database.  The relational paradigm is quite different
from an object paradigm, but there are still many different equally correct
ways to represent objects in a relational schema.  The design of such a
persistence adaptor in some part depends on the
capabilities of the particular RDBMS.  For example, does it support BLOBS?
If so, you may want to store an Image in a single BLOB field.  If not,
they may need to be split up or dealt with another way.

More sophisticated OO-RDBMS persistence adaptors have flexible mapping
mechanisms where you can control how persistence is done.  For example, you
could make the relational representation more suited to reporting tools
and other SQL-enabled tools.  Some adaptors also provide support for schema
migration, two-way synchronization, etc.

Another example: Together/CC.  It provides automatic synchronization between
a file representation of Java (or C++, but not Python (yet)) and UML models.
It is automatic in the sense that the sync is performed often enough that the 
chances
of someone making an incompatible change to both the file system representation
of an object and the UML representation are small.  It has some tools for
conflict detection and resolution, but largely avoids the issue via a
finely tuned synchronization mechanism.

Another example is JAXB.  The JAXB technology allows one to create a mapping
between XML and Java.  It is flexible enough such that you can actually
create XML documents that are human readable and readily used by XML
processing tools, yet the Java classes look natural, as well.  Another way
to say this is the mechanism allows for the simultaneous or asynchronous
evolution of both the XML and the Java.  Smart mapping technology bridges
the gap.

You are proposing a persistence mapping from Zope objects to a hierarchical
file system that includes, (and is heavily driven by the requirements of) a 
sychronization technology, but it is still a persistence mechanism.

I understand that the primary drivers for the synchronization
effort are different from the primary drivers for the Python persistence effort,
but I think it would be a mistake not to have some cross-pollination.
I believe you may find a significant amount of overlap,
especially in the lower layers of the design.

[ One fun way to do that might be to do some pair programming with team members
from each team :-) ]

>> In an ideal world, they would both be pluggable and I would
>> simply use them "together" so that I could influence the representation
>> of Zope objects in the filesystem and still get automatic
>> synchronization.
> 
> Note that the kind of synchronization I'm talking is not as automatic
> as you might think. Synchromization is of the CVS or subversion style.
> It only happens when you explicitly ask it to and it may require human
> intervention to succeed (to resolve conflicts).

No problem.  At the extreme end (such as during development) I can ratchet
that up to where, for all intents and purposes, it is real-time, no?
Not that I intend to do this, but there is no reason one couldn't do
so, right?

>> For example, I could store a single Zope object in two separate files,
>> one for "data" and one for "metadata." 
> 
> That is what is proposed in the subject proposal.  Note that in Zope 3, 
> there
> are many kinds of meta-data (e.g. Dublin Core data, security assertions, 
> etc.).
> Each kind of meta-data is typ;icaly stored in a separate annotation, with a
> separate file (or directory) per annotation in the proposed file-system
> representation.

I read the proposal more carefully.  I like the layout you proposed.

>  > That way the "data" file
> 
>> could be structured in a way that was natural and easy for humans
>> to read and tools to process.  I can always combine them easily with
>> XSLT. 
> 
> Or, as the proposal describes, the data are knit back together as part
> of the synchromization process.

Right.  It is a simple matter to point my XSLT at a group of files
that together represent (the data and metadata for) a single
Zope object and produce reports or whatever.

>> In any event, I don't see any _one_ answer to this problem.  That is 
>> why I
>> would like to see flexibility in how objects are represented.
> 
> Well, given the APIs, once could pretty simply adapt the provided 
> synchronization
> tool to provide different layout rules. For example, nameing conventions 
> could be
> used to provide a shallower (but wider) file-system layout.

Excellent. That's what I wanted.  However, I may also want to control the
schema for the files that are produced.  Did you mean this, too?

> The whole point of the proposal is to provide a representation that is 
> convenient
> for these tools, as opposed to a representation that is convenient for 
> the Zope
> application.  OTOH, a persistence system's job is tp provide a data 
> representation
> that's useful for direct use by the application.

See above.  I think we violently agree on everything save a (possibly
obscure :) philosphical point.  I would call this a persistence
mechanism, but call it what you will, it will certainly be useful.

--Craeg

> 
> Jim