[Zope3-dev] Re: RFC: File-system synchronization proposal: ExternalFile?

Thu, 29 Aug 2002 08:46:31 -0400

Craeg K Strong wrote:
> Hello:
> 
> Jim Fulton wrote:
> 
>> Craeg K Strong wrote:
>>
>>> I do, however, believe that there will be overlap between the
>>> pluggable persistence adaptor and the synchronization stuff.  Or
>>> at least that you will want to use them together.
>>
>>
>> I don't think so, but I should explain a little bit why.
>> The file-system representation takes a file-centric hierarchical view
>> of the world.  The natural boundaries in this world are files and
>> directories.
>>
>> A persistent object system has a very different view of the world.
>> Boundaries between objects may be much more fined-grained and may
>> only have a loose connection with a file view.
>>
>> For example, a large image is persistent as several persistent objects,
>> but in a file-system representation, it should appear as one. 
>> Similarly, a
>> folder may use a number of persistent objects in it's implementation that
>> don't appear in the file-system representation.
> 
> 
> I understand what you said, but I still think it is a distinction without
> a difference.
> 
> We are talking about the impedence mismatch between a hierarchical
> file system representation and a native Python object representation.
> 
> Objects can be persisted in several
> different ways, for example, an Image could be stored as a single object
> or broken up into multiple objects.  That's all handled by the persistence
> adapter.  A persistence adaptor is designed to bridge an impedence
> mismatch between two media.  Other than that, they can vary
> radically based on the particular requirements they are designed to
> fulfill.
> 
> The obvious example is the impedence mismatch between objects
> and a relational database.  The relational paradigm is quite different
> from an object paradigm, but there are still many different equally correct
> ways to represent objects in a relational schema.  The design of such a
> persistence adaptor in some part depends on the
> capabilities of the particular RDBMS.  For example, does it support BLOBS?
> If so, you may want to store an Image in a single BLOB field.  If not,
> they may need to be split up or dealt with another way.
> 
> More sophisticated OO-RDBMS persistence adaptors have flexible mapping
> mechanisms where you can control how persistence is done.  For example, you
> could make the relational representation more suited to reporting tools
> and other SQL-enabled tools.  Some adaptors also provide support for schema
> migration, two-way synchronization, etc.
> 
> Another example: Together/CC.  It provides automatic synchronization 
> between
> a file representation of Java (or C++, but not Python (yet)) and UML 
> models.
> It is automatic in the sense that the sync is performed often enough 
> that the chances
> of someone making an incompatible change to both the file system 
> representation
> of an object and the UML representation are small.  It has some tools for
> conflict detection and resolution, but largely avoids the issue via a
> finely tuned synchronization mechanism.
> 
> Another example is JAXB.  The JAXB technology allows one to create a 
> mapping
> between XML and Java.  It is flexible enough such that you can actually
> create XML documents that are human readable and readily used by XML
> processing tools, yet the Java classes look natural, as well.  Another way
> to say this is the mechanism allows for the simultaneous or asynchronous
> evolution of both the XML and the Java.  Smart mapping technology bridges
> the gap.

You make a good point, however, I wouldn't design a persistent system like
this if I could avoid it,  If I were designing a persistence system for relational
database, I would make the persistence boundaries coincide with the boundaries
in the RDBMS (ie records). Any impedence matching would be done at a higher level.

> You are proposing a persistence mapping from Zope objects to a hierarchical
> file system that includes, (and is heavily driven by the requirements 
> of) a sychronization technology, but it is still a persistence mechanism.

I don't agree, but that's probably because I choose a much more specific definition
of persistence than the one that you are using.

> I understand that the primary drivers for the synchronization
> effort are different from the primary drivers for the Python persistence 
> effort,
> but I think it would be a mistake not to have some cross-pollination.

Perhaps, OTOH, I think that there is a risk of making both efforts fail
through excessive, or at least premature, generalization.

> I believe you may find a significant amount of overlap,
> especially in the lower layers of the design.

I really don't agree. The forces affecting the design of a persistence
system (e.g. very efficient update and retrieval, atomic updates,
concurrency control, etc.) are just very different from the forces
driving the design of the synchronization system.

>>> In an ideal world, they would both be pluggable and I would
>>> simply use them "together" so that I could influence the representation
>>> of Zope objects in the filesystem and still get automatic
>>> synchronization.
>>
>>
>> Note that the kind of synchronization I'm talking is not as automatic
>> as you might think. Synchromization is of the CVS or subversion style.
>> It only happens when you explicitly ask it to and it may require human
>> intervention to succeed (to resolve conflicts).
> 
> 
> No problem.  At the extreme end (such as during development) I can ratchet
> that up to where, for all intents and purposes, it is real-time, no?

I don't know what you mean bt "real time".

But think about this: would you "cvs up" and "cvs commit" in real time?

....

>> Well, given the APIs, once could pretty simply adapt the provided 
>> synchronization
>> tool to provide different layout rules. For example, nameing 
>> conventions could be
>> used to provide a shallower (but wider) file-system layout.
> 
> 
> Excellent. That's what I wanted.  However, I may also want to control the
> schema for the files that are produced.  Did you mean this, too?

You have control over the content schema and how it is broken up into files
and directories includeing "extra" data.

You also have control how individual annotations can be serialized.

...

> See above.  I think we violently agree on everything save a (possibly
> obscure :) philosphical point.  I would call this a persistence
> mechanism, but call it what you will,

:)

> it will certainly be useful.

Thanks. I think so too.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org