[Zope3-dev] Axe DTML Document

Martijn Faassen faassen@vet.uu.nl
Wed, 19 Dec 2001 04:30:20 +0100


Casey Duncan wrote:
[snip]
> To me this brings me back to what is a file and what is a document and should 
> we draw that distinction at the metatype level as we currently do, and create 
> a different metatype for every conceivable format?
> 
> This is a convenient at first, but highly unscalable solution. So the 
> suggestion is that we have 1 or 2 content objects. If we have two, they would 
> be:
> 
> 1. Binary files
> 2. Textual Files that can be edited directly "in place"

Well, I'm currently developing a document type that uses ParsedXML
as its content object, and they're XML trees inside the ZODB, so that
doesn't fit either 1 or 2.

So we have content objects for documents, and on top of that we have
document objects that adapt the content objects to a set of document
interfaces. They may also support a few interfaces of their own,
specific to what type of document they are. Types of content objects
I can distinguish useful for documents are binary files, plaintext storage, and 
DOM tree storage, though I imagine more would arise in the future.

> Now since, #2 is a subset of #1, is that a useful distinction? Zope 2.5 has 
> blurred the line here by making textual file editable through the web. So 
> perhaps we need only have files?

I think that's going slightly too far; 'textual content' carries with it
the assumption that it is human viewable/editable, while 'binary file'
does not. That's not to say they couldn't share some interfaces, though.
They may in fact be quite similar.

> A distinction that was pointed out is that Documents have different views, 
> and can be in some cases treated like a DOM tree and manipulated as such. 
> But, I think you could argue the same for certain binary types there.

> I'm still a ComponentArchitecture newbie here, so maybe it's not the pancea I 
> imagine where you have "pure" content data in one object, with some generic 
> functionality. And then specialized functionality (such as XML parsers, PDF 
> writers, HTML mungers, Structured Text Renderers, MSWord converters, ad 
> naseum) is in these separate "adapter" object/classes.

This would be too inefficient for many purposes, though possible in 
some cases (StructuredText builds a DOMish tree on the fly, for instance).
Often the content is not in text form but in tree form, for instance
as an XML DOM tree. The document content could even be retrieved from
a relational database somehow, and the content object may only have
some references to ids in the RDMS. As long as it can support the
Document interfaces that'd be fine.

> I'm sure it will come to me as I learn more.
> 
> So to distill it down to it's bare essence:
> 
> A document is a contaner for a blob of content, with associated metadata and 
> a facility to associate it with template "views" for presentation (if 
> necessary) and adapters for manipulation.

I think an 'editor view' is an important one that needs framework support.
(at least the ability to query for one). Search facilities are another
important type of thing that needs its own interfaces (though in part
it'd depend on the form of the content; a DOM tree can have all kinds
of paths into it to specific nodes, while plaintext would not have).

> These associations may be automated 
> based on the content-type of the data.

> A document should map cleanly to a single file so that FTP/WebDAV 
> manipulation makes sense. Adapters could be used to include and extract 
> metadata from content on the fly for certain formats (such as HTML or XML, or 
> even an extended STX). Or alternately, metadata could be serialized as a 
> "parallel" file in the FTP/WebDAV point of view.
> 
> Editing or manipulation of a document TTW should be facilitated by a view. A 
> texteditor view would work much like the current "Edit" view of DTML objects. 
> A DOM view could be used to inspect and work with tree-like data, such as 
> XML. Views could be associated by content-type as well. 
> 
> In my mind, none of these requirements point to a definite need to have 
> separate document and file objects at the base level. Is there a requirement 
> that I have missed that does?

Yes, the requirement to have a tree structured document represented 
in the ZODB directly. Tree structures are extremely common for documents
and it makes sense to store such structure directly (offering perhaps
a plaintext representation, possibly roundtrip as for ParsedXML).

What you possibly *could* say that the document _interface_ supports
asking for plaintext representation of the document, even though it
may not be stored as such. If you added as a further requirement that
the document should also be able to accept plaintext for storage (and
again may not store it as such, but may parse it and store it as a 
tree, say), it'd be easy to use this interfaces to create a 
set of editing components exploiting this, along with a download/upload
facility.

Perhaps that's what you meant all along. But let's make clear that
we need to focus on the requirements for the interface here, not on the
requirements on the specific representation of the content. I think
that is good Zope3 thinking, but who am I to say? :)

Regards,

Martijn