[Zope3-dev] Axe DTML Document

Martijn Faassen faassen@vet.uu.nl
Wed, 19 Dec 2001 04:15:42 +0100


Lalo Martins wrote:
> On Tue, Dec 18, 2001 at 09:29:31PM +0100, Martijn Faassen wrote:
> > Lalo Martins wrote:
> > [snip]
> > > Ideally, this Ideal Document component would store its text in one
> > > single, simple format - StructuredText or some kind of XML.
> > 
> > This is completely impossible, however -- I may want to use one type
> > of XML and you may want to use another, while the guy next door wants
> > to use reStructuredText.
> 
> Why? And why not just convert (adapt) it?

Because if it isn't present in the underlying representation there is
simply nothing to adapt. I'm not saying adapters play no role, but you
can't adapt information into being that wasn't there before.

> We're talking about "documents" here - meaning, any content body
> not complex enough to justify its own component class.

Oh, you're discussing the 90% catchall document that will be delivered
with the document _framework_ then. That's fine with me, but I think
a framework is important. I want to do more with documents than just simply
read them. 

> So, I
> advocate that, yes, it should be a very simple and single
> format, bacause if you can't represent your data in, say,
> StructuredText, then it's *NOT* a Document, but a component
> deserving its own class.

So you're saying DocBook documents aren't documents? What about PDF
documents? What about something almost representable by StructuredText
*except* you want to mark up persons for some reason?

You'd be confusing people if you only accepted the 'can be represented
as structured text' as a document.

> One of the goals of Zope3 is to make it easier for you to have
> your own component classes (like ZClasses should have done for
> Zope2, only better).

Sure, but I think documents are of such a central importance we shouldn't
forget the myriads of use cases right away from the start and focus on
structured text..

> > (try searching for, say, persons, if you have no way to indicate what a 
> > person is in your underlying representation. Or think of advanced 
> > hyperlinking) 
> 
> A "Person" is not a "Document".

A person can occur inside a document. Like such:

<header>Chapter Foo, the Marking Up</header>
   
<p>Hello, this is a nice paragraph containing some text. 
<person>Martijn</person> is a person.</p>

..

It depends on what the problem domain finds important. Perhaps I'm interested
in dates in a document, for instance, so I can find stuff again. Perhaps
I have part numbers in my documents, so I can create hyperlinks to
my parts database.

I'm claiming that those are important use cases; I don't agree italic
and bold (say) is sufficient markup for anything you can call a 'document'.

> > > Most situations where this is not OK are really mixing content
> > > and presentation.
> > 
> > That is in my opinion not correct. In many cases you deal with types of
> > documents that have domain-specific requirements, and this has nothing to 
> > do with mixing content and presentation. For instance, currently I'm
> > dealing with various university publications, but also with summaries of the
> > Dutch republic's State General meetings in 1625, and am also looking into
> > biographical descriptions. 
> 
> And what is the problem? Do they *require* more complex markup?
> Or do they just have some of their own metadata?

They require markup, so as not to lose information. I want to be able to
click on a historical person and get their biography, or something like
that. The university publications need references to subcontent, as the
publications can get quite complicated (also there's indexing to 
consider; you need to be able to markup indexable words). Biographical
descriptions tend to contain dates, and I want to look for dates of
birth and so on.

> I think a very complex Document component leads to abuse. This
> simple "Document" component should *NOT*, I repeat, be used for
> things that should instead be their own component classes.

That's why I'm advocating a document *framework*. Many documents need
the same metadata. Many documents can share some interfaces ('present
me as HTML', for instance, or 'show my management screen'). By focusing
on one type of document too soon we'd lose all the benefits of developing
a framework, which is what Zope 3 should be about; it should be a framework
for constructing frameworks.

Obviously you have different requirements than I do, but that doesn't
mean my requirements are invalid or that you wouldn't benefit if some
of these requirements were implemented. :)

> > I agree that for a lot of documents a fairly simple representation such 
> > as StructuredText is enough. But I also believe that trying to fit
> > *everything* into one single representation wouldn't work at all.
> 
> Not everything. Just everything that is simple enough.
> 
> > Why not build a toolset to work with different types of representation?
> 
> Because (re)presentation is outside the document domain.

*representation* is not outside the document domain at all! Presentation
may be, but that's an entirely different concept.

A document contains textual content that is generally read by humans. A
document can however also be processed by computers, which is why we
put them inside a machine in the first place. By having a good
computer-accessible representation of your document content you can
improve things like annotation, linking and searching abilities; you
can probably also increase the ways in which you present the document to
users.

Documents are also generally written and edited by humans. Documents are
commonly published, and therefore need a review procedure. Generally 
documents have some common metadata as well. Documents do not necessarily
have the same internal representation, however.

Why not capture the commonalities in a framework? It doesn't have to be
perfect and extensible the first time around, but I suggest we at least
try. :)

> > good enough for many purposes, but at least nobody would be forced to
> > work around it (or with it while they shouldn't as their content model is
> > too different).
> 
> I'm not proposing a single component that will from now on be
> used for all Zope3 content. By all means. Content components are
> even their own category and treated with special care in the
> documentation. I *WANT* people to develop hundreds of these
> classes.
> 
> So, what is the purpose and place of a generic "Document"
> component in this world?
> 
> The way I see it, a component for data that is too simple, or
> a temporary holder for data you're beginning to work with - you
> store your meeting summaries in a Folderful of Documents, then
> you work with them for some days and gather notes on the
> requiremens for their own class - and then you go and develop
> that class.
> 
> Perhaps Document could even have some facility to convert itself
> to some other class?

Well, rename Document to StructuredTextDocument (or whatever is decided
it can contain) and have it share an interface IDocument with all Documents,
and I'd be happier. We could offer a PlainTextDocument and even a HTMLDocument
as well (even though I'd generally not use the latter for anything serious).

> [verging off-topic]
> 
> Also, don't understimate StructuredText. StxNG is more
> "structured" than "text" and can hold up pretty much anything -
> it's almost a "very very very light, human-readable XML".

I'm not underestimating structured text at all, and I have a simple
slideshow component for Zope on my harddrive that exploits the DOM-like
properties of StructuredText. I think there are some limitations to
structured text, however. reStructuredText is a project to fix what in
their view are such limitations:

http://structuredtext.sourceforge.net/

See also this for a description of what they consider problems with
StructuredText:

http://structuredtext.sourceforge.net/spec/problems.txt

I'm suggesting that any Zope3 document framework should be able to
accomodate such developments -- even for the 90% fit people have
different ideas, and it'd be a shame if we'd pass up the opportunity to
make integrating these different styles of document easier in Zope3.

> The
> fact that we use it with some very defined rules to render to
> HTML is an historical artifact of habit, but remember that the
> Zope Book is also written in stx and converted to a lot of
> formats using different rules.

Sure, StructuredText is very neat, but it isn't the be all and end all 
of documents. Even the new structured text still has some problems 
(last I checked) with paragraphs fitting on a single line (they
turn into a heading, also in the content model), etc. Many end users
would struggle with things like this if they had to use structured text.

You and I like plain text (and I use emacs instead of a word processor to
edit my documents), but not everybody can do this. This is one reason
I'm suggesting a Zope3 document framework should contain interfaces
supporting custom editors ('give me the default editor for this
document' and such).

Regards,

Martijn