[Zope3-dev] Axe DTML Document

Jeffrey P Shell jeffrey@cuemedia.com
Tue, 18 Dec 2001 13:44:14 -0700


On Tuesday, December 18, 2001, at 12:43  PM, Casey Duncan wrote:

> There have been good points raised here by all. I think I will draft a
> Fishbowl proposal on this sometime soon. Here is my take on what a 
> "Document"
> object would be:
>
>   - The data/content would be textual and static. I doubt you 
> could create a
> single XML format to efficiently and effectively handle all cases, 
> so likely
> it would need to support whatever textual format you desired.

It wouldn't need to be an XML format.  However, it *could* be some 
DOM like structure.  Or it could not (depending on the input 
stream).

>   - It would have the capability to add metadata/properties. Would the
> ability to specify a common/default schema (dublin core?) be 
> beneficial?

Yes.  At the very least there's Title.  And then there's the other 
common HTML ones (Keywords ('Subject' in DublinCore), 
Description).  Many Word documents have these fields too, although 
they're not always filled in.

I don't know if Zope 3 proper needs to care as much about the 
metadata if there's no real content management capabilities 
installed, but a proper Document Handler should try to at least 
parse out the metadata and offer it to whomever may ask for it.

Actually, this could evolve/devolve into a whole class/interface 
structure for Content Handlers in general.  Other forms of content 
such as MP3's and some image formats also contain extra properties 
about the content.

>   - It would have History/diff capabilities perhaps like what DTML 
> objects
> have now, maybe better.

Diffing textual content is hard.  Source code tends to be easier 
since it's line oriented.  I would *love* to see a good diffing 
algorithm for content.  It's a feature of Word that I especially 
like.  I don't know if I'd expect this of Zope properly, but if 
there was a ChangeTracking component that could be replaced, I'd be 
happy.

>   - Supporting adapters would be created to do things such as HTML
> decapitation, format conversion (STX -> HTML, HTML -> Raw Text, 
> LaTEX ->
> DocBook, whatever), metadata extraction, etc.
>
>   - It would have some form of catalog or index awareness.

I'm hoping that catalog and/or indexes will be built in enough to 
Zope 3 that this is a given.  I'm also hoping that a real event 
model will eliminate the need (or cut down on the need 
significantly) for the notion of "catalog awareness".

>   - It would be available for 3 easy payments of $19.95 (No CODs, 
> please)!

Only 3 easy payments of $19.95?!?!  That's unbelievable!

> The document object itself will probably be dead simple and likely 
> dumb as a
> stump. Not much there beyond a file object. Actually, what IS the 
> difference
> between a document and a file, once you have ComponentArchitecture to
> componentize the behavior?

Heh.  This last sentence sounds like a really bad geek joke that 
could be used to start out a conference presentation.  :)

The answer is - probably not much.  I would think that the 
difference would be that what we consider a Document is something 
for which the default View/Presentation is HTML and is made of 
textual content.  But what really differentiates a PDF Document 
from a Word Document from an HTML Document?  Is PDF a "dumb 
document" until you install a "PDF Reader" that can parse data 
out?  or is it just a file?


Jeffrey P Shell, jeffrey@cuemedia.com