[Zope3-dev] Axe DTML Document

Martijn Faassen faassen@vet.uu.nl
Thu, 20 Dec 2001 01:38:54 +0100


Lalo Martins wrote:
[snip]
> > So you're saying DocBook documents aren't documents? What about PDF
> > documents?
> 
> Yes. DocBook and PDF are representations.

I'm not sure I comprehend the 'yes' or the way you use 'representation'.
Perhaps I use the latter in a weird way? I mean, I'm talking about
representing information; such as the way you encode or mark up
data. DocBook is focused on representation of certain types of document
content. PDF Documents however are more focused on presentation (as
something printable).

I use the words representation and presentation quite differently, and
you seem to be able to interchange them more easily, so perhaps this gives rise
to some confusion?

> > What about something almost representable by StructuredText
> > *except* you want to mark up persons for some reason?
> 
> That's another point - the one I hadn't seen before your message.

That was the point I was trying to make; you'll run into quite a few
of such 'exceptions' in many problem domains. Which is why we have
things like XML. XML is not friendly to edit by hand (though it could
be worse), but it's pretty good at dealing with semi-structured information.

> > You'd be confusing people if you only accepted the 'can be represented
> > as structured text' as a document.
> 
> I'm talking about an hypothetical StructuredText which makes
> better use of the DOM capabilities - perhaps
> reStructuredText. To avoid confusion, let's switch it for an
> hypothetical ZDocXML format.

I still think you'd be confusing people if ZDocXML was the only valid
'Document' in Zope. Let's call a ZDocXML document a ZDocXML document. :)
Other people will want other styles of document, and they can make
legitimate claims to the Document word. If you assume Documents == 
ZDocXML Documents you'd confuse people, including me. :)

Not that I'm against such a thing, but I'd like to widen the scope
before we focus it again on a few common cases (such as StructuredText)

> > > A "Person" is not a "Document".
> > 
> > A person can occur inside a document. Like such:
> > 
> > <header>Chapter Foo, the Marking Up</header>
> >    
> > <p>Hello, this is a nice paragraph containing some text. 
> > <person>Martijn</person> is a person.</p>
> 
> All right. I have a proposal. But wait, you have more points :-)
> 
> > > > Why not build a toolset to work with different types of representation?
> > > 
> > > Because (re)presentation is outside the document domain.
> > 
> > *representation* is not outside the document domain at all! Presentation
> > may be, but that's an entirely different concept.
> 
> No. Representation is, in this aspect, a special case of
> presentation, as far as the component architeture is concerned.

I disagree -- you can never devise a document type that can somehow
magically include all the possible things you'd want to include. 
Now we have <person>, but tomorrow I may want <date> and the next day
I may want <starship>, whatever. You can't get from a common representation
what you don't put in in the first place; that'd take either a human to
guess, a human equivalent AI to do the same, or magic. :) (I can implement
magic in Python if we have a timetravel module, btw, unfortunately
that's only expected for Python3k.

Anyway, this is why we should have a framework that does take care
of the many commonalities, but doesn't fix on one or a few types of
representation.

> Ideally, you should be able to change your Document component
> for another one that has a different internal representation and
> not notice it in your code - perhaps I should have said that
> representation is outside the domain of IDocument, to be precise.

Yes, we should be able to change the content object for document
content, and still adapt it (or otherwise extend it) with some interfaces
and perhaps other content objects (for metadata, for instance) to be a
full fledged document that the user interacts with.

Anyway, here we seem to be in agreement; I must've gotten the wrong
impression before.

> > A document contains textual content that is generally read by humans. A
> > document can however also be processed by computers, which is why we
> > put them inside a machine in the first place. By having a good
> > computer-accessible representation of your document content you can
> > improve things like annotation, linking and searching abilities; you
> > can probably also increase the ways in which you present the document to
> > users.
> 
> And you can also, by choosing a good representation, provide the
> capability of rendering it in a miryad of other representations,
> so that annotating, linking, searching etc. components that work
> only with html, docbook, pdf, stx, or whatever can parse it.

Yes, you could transform it to another type of representation that
other systems may know how to deal with. But it's frequently hard
to do so without losing information; if I render a document containing
marked up persons to any of the formats you named, I'd lose the information.
I need to go to the source to get it back again. So depending on some
intermediate representation that is common is not enough either, just like
mandating a single base representation isn't.

That isn't to say we can't standardize on a lot of interfaces that have
to do with documents, and even more if content is DOMish.

> > Documents are also generally written and edited by humans. Documents are
> > commonly published, and therefore need a review procedure. Generally 
> > documents have some common metadata as well. Documents do not necessarily
> > have the same internal representation, however.
> > 
> > Why not capture the commonalities in a framework? It doesn't have to be
> > perfect and extensible the first time around, but I suggest we at least
> > try. :)
> 
> Perhaps. I still think a transparent internal representation is
> the best foundation.

Well, you'd lose me right away if you picked that as the foundation; your
document would be useless to many of the applications I'm developing,
which would be a shame. Of course you may consider this worth it
from your perspective, and that's fine. :)

> > Well, rename Document to StructuredTextDocument (or whatever is decided
> > it can contain) and have it share an interface IDocument with all Documents,
> > and I'd be happier. We could offer a PlainTextDocument and even a HTMLDocument
> > as well (even though I'd generally not use the latter for anything serious).
> 
> No. Please. Don't expose the internals.

So what do you do if people want to place HTML content? Or
plain text for that matter? They still need versioning, viewers, 
uploaders, metadata, and perhaps also editors. You just tell them to
go and learn structured text? What if I have the complete works of
Shakespeare in another format already? Will you tell me I need to convert
it and lose valuable information? I can't use the document framework
of Zope?

In order to do many useful things with documents, you need access to the
representation. This doesn't mean we can't still build layers of
abstraction; we should. But the internal representation is in many
cases not dirty or ugly or 'internal' at all; it's semantic information
that can be used by applications. It's in fact exactly what documents are
all about -- their contents!

Storing HTML directly is something to be avoided, but not all
representations of documents are as useless semantically as HTML
(some are even more useless, such as PDF, or even more, Word :).

[snip reStructuredText]
> Right, sorry. Some obscure part of my mind probably thought
> those features were already implemented.

I don't know how up to date that document is; perhaps quite a few of these
criticisms are by now, though some seem more fundamental.

> > You and I like plain text (and I use emacs instead of a word processor to
> > edit my documents), but not everybody can do this. This is one reason
> > I'm suggesting a Zope3 document framework should contain interfaces
> > supporting custom editors ('give me the default editor for this
> > document' and such).
> 
> This is (IMHO) exposing implementation.

How is 'give me the default editor' exposing implementation? I'm not
following you here.

> I'd prefer "give me an
> editor for this document, that doesn't lose any metadata" for
> the picky,

I don't know what you mean by not losing any metadata; why whould
an editor for document content affect its metadata?
If you mean you lose useful markup (like <person> tags), sure, I agree
editors shouldn't do that, ideally.

> and "give me this document in format FOO" for the
> tolerant. (Of course, this raises the problem of dealing with
> lossy conversions, but I believe we're smart enough to find a
> solution for this.)

Lossy editors are just tricky, I will be trying to avoid such a thing
myself. I mean, if I ask for the document in format HTML and then I 
edit it and some magic needs to convert it back into whatever XML I 
was using?  

But of course there needs to be a way to ask for 'render this document
in a certain format'.

> Oops. This is too big. I'll post the proposal in a different message.

Okay. :)

Regards,

Martijn