[Zope3-dev] Pre-proposal: IDocument and friends

Thu, 20 Dec 2001 10:32:06 -0200

Pre-proposal: IDocument and IDocumentJargon components

Abstract

  This is a proposal to address the requirements of a flexible,
  generic Document component for Zope3, raised in the "Axe DTML
  Document" thread in the Zope3-Dev mailing list.

  All proposed names are, of course, subject to discussion.

Problem

  A lot of data in a Zope site is too complex for an opaque blob
  (such as the current "File" object) and too simple to demand
  its own component class.

  Of the "kinds" of data (using the term broadly, to mean
  something that could or could not become a class) that have
  enough structure to be a class, many share common structure
  and features - a tree-like structure and the need for at least
  one textual, human-editable representation, plus automatic
  rendering to the web. This suggests these classes should
  either use some common utilities or subclass a single base
  class.

  Requirements

    1. Documents are primarily text, with a tree-like structure.

    2. The default presentation should render the document
       automatically to whatever format the user wants. This
       should take in account the Zope3 idea of "default
       presentation", which already depends on medium (HTTP, for
       instance) and accepted formats (in case of HTTP, the
       "Accept:" header). In the canonical case, this will mean
       filtering trough a standard page template somewhere and
       producing xHTML.

    3. At least one human-editable, lossless format should be
       available, for WebDAV and ZMI textareas.

    4. Renderers for new formats (for 2 and 3) and the
       respective parsers (for 3) should be transparently
       pluggable, leveraging the Component Framework.

    5. There should be a "basic" structure (see "Flexibility of
       the tree" below).

    6. It should be easy to tag (mark up) specialized content in
       the text - for example, mark up all references to people
       or email addresses.

    7.  Documents that don't want to follow the "basic"
        structure at all, should be allowed to (see "Flexibility
        of the tree" below).

  Flexibility of the tree

    Solving this problem with one single interface would lead to
    excessive or insufficient rigidity. A significative portion
    of the documents this proposal addresses conform more or
    less to a general tree structure, with multiple levels of
    containment (which can be book/chapter/section/subsection or
    not) and some basic meta-layout (such as emphasys, bulleted
    lists and "foot"-notes).

    (For simplicity, we'll assume that this basic structure is
    more or less the one supported by the current version of
    StructuredText, as of this writing in December 2001. This
    might or might not be the case.)

    The necessity of additional tagging, as defined in
    requirement 5, can be addressed by adding markup to the
    basic structure.

    However, some documents don't share this basic structure;
    -FIXME- no examples come to mind. So it is necessary to
    allow a different basic structure to be used.

Solution

  IDocument

    Define one component interface, IDocument. Different
    implementations could exist, and they could conceivably
    implement the interface using different internal document
    formats. But the difference between these should be only a
    matter of performance, never functionality.

    This interface doesn't need to do a lot; converting from one
    interface (IDocument) to some presentation is already
    handled by the Component Architeture. IDocument just needs
    to provide a way back, for requirement 4.

    So, the IDocument interface only specifies one method:
    'updateFrom(object)', which uses the component framework to
    find out what kind of content is in 'object', then updates
    its content based on it if a converter is available,
    otherwise raises an exception.

  ZDoc

    This proposal introduces the hypothetical format "ZDoc",
    used to describe an IDocument. While it is possible that
    implementation of this proposal decides to use ZDoc as the
    internal representation or even as exchange format, this may
    not necessarily be so. For now, this format is only an
    abstract tool we'll use to communicate between ourselves, to
    have an idea of what can be in an IDocument.

    Let's imagine ZDoc as an empty XML schema, on top of which
    we'll use XML namespaces to introduce semantics.

    In an hypothetical implementation of IDocument, the document
    content would be converted into the corresponding ZDoc and
    stored in the ZODB in this format.

    Each ZDoc element has one default namespace, specified in
    the usual XML namespace notation
    (http://www.w3.org/TR/1999/REC-xml-names-19990114/ for info).

    Let's also imagine a standard namespace for ZDoc -
    ZStructuredDoc. This namespace defines a set of tags and
    attributes that matches requirement 4 (this would probably
    be more or less the featureset of StructuredText as of this
    writing). So, with the correct xmlns attribute in the
    top-level <Document> element, the whole document gets
    formatted by ZStructuredDoc.

  IDocumentJargon

    Now we need to address requirements 6 and 7.

    For this, you'd implement a component which implements
    IDocumentJargon. This component will provide additional
    markup and/or alternative structure.

    This interface defines two methods: 'makeElement(source)'
    and 'processElement(element)'.

    The method 'makeElement', given an element from the tree
    (-FIXME-: DOM object or XML string?) returns an
    IDocumentElement object.

    The Document framework will include an utility to register
    IDocumentJargon components. When an IDocument instance is
    being parsed, the last path component of the namespace name
    is used to build the correct jargon. For example::

    <Document
    xmlns:py="http://www.zope.org/Members/lalo/xmlns/PyDoc">
      ...blah blah blah <py:module>time</py:module> blah...
    </Document>

    During parsing, the jargon registry would look for a jargon
    named 'PyDoc'. This component would be used to turn the
    py:module tag into an IDocumentElement object.

    Element attributes with a namespace different than that of
    the element itself are processed by 'processElement'.

    Order of processElement calls

      More specific namespaces (those specified in inner
      elements) are looked up first. For those specified in the
      same element, stands the order of the xmlns statements.
      Example::

        <Document xmlns:py="http://www.zope.org/Members/lalo/xmlns/PyDoc"
                  xmlns:zope="http://www.zope.org/xmlns/ZopeAPIDoc">
        ...blah blah blah
        <py:class name="DateTime"
                  xmlns:iso="http://www.zope.org/Members/lalo/xmlns/IsoStuff">
          <py:method url="http://www.python.org/doc"
                     iso:std="8601"
                     zope:name="ZopeTime">ISO</py:method>
        </py:class>
        blah...
        </Document>

      The element py:method would first be built by calling
      PyDoc.makeElement(); then this element would be fed to
      IsoStuff.processElement(), and finally
      ZopeAPIDoc.processElement().

  IDocumentElement

    Basic API not yet defined. This interface can be DOMish or
    Zopeish (e.g. 'objectValues'), or have its own API
    (e.g. 'elementValues()'), or any combination of these.

    Besides tree navigation, searching and modification, this
    interface has two methods, to specialize rendering.

    method 'present()' --
      used for generating lossy presentations. Should return an
      IDocumentElement with that corresponds to a reasonable
      representation of this element in ZStructuredDoc.

      For example: '<faa:person>Lalo</faa:person>' could become
      '<extlink url="http://www.laranja.org/">Lalo</extlink>',
      perhaps by looking up a person-to-URL database somewhere.

    method 'render(interface)' --
      used when there is something better than the "reasonable"
      for a lossy presentation, for one given target.

      For example: '<mm:sound>fascinating</mm:sound>', when
      rendering to HTML, could generate an '<object>' tag to
      embed an audio player.

      This method should raise a standard exception when it is
      not applicable - let's call it 'NotthingAppropriated'. The
      rendering adapter will catch this exception and handle it
      by calling 'present()', then rendering the resulting
      ZStructuredDoc.

Risks

  These points have been raised by Paul Everitt in the thread:

  1. If the class that the pickle is an instance of is overly
     rich, then you might find yourself always writing
     converters for every Zope upgrade.  Also, the data may be
     less usable outside of Zope.

  2. I'm someone that uses CMF Documents quite aggressively.
     I'm constantly trying to find some damn tool on some
     operating system that can replace a TEXTAREA w/o
     disappointing me.  Thus, in some ways I have expectations
     of my Zope3 folder appearing to be a fileserver, albeit one
     that can give me a bunch of extras when I look at it
     through a web browser.  This is an important usage.  It
     covers the majority of content currently being authored by
     the majority of average users.

  3. In some ways, the smarts of CMF Document leads to
     unexpected behavior for (2).  For instance, say I'm using
     WebDrive/DavFS to edit a text document.  I save it.  It's
     immediately out-of-date, because the CMF sticks Dublin Core
     headers into it.  If you pick some neutral DOM
     representation, you're by definition changing the original
     in a perhaps lossy way.  Users might not expect that and
     give up when things don't work as expected.

  4. OTOH, most of the value an organization gets is in turning
     raw data into repeatable, standardized content with rich
     services.  Pulling this off without alienating users (see
     (3)) is the trick.

  Other risks:

  5. A component not flexible enough, or not specialized enough,
     or too hard to use (UI), could alienate users.

  6. If it's to hard to write and register parsers (converters
     from some format to, say, ZDoc), they won't be written, and
     the framework will therefore be less useful.

--

I don't know, I feel something is missing. I sense the presence
of holes, but I can't pinpoint them. So, here it is, for peer
review ;-)

[]s,
                                               |alo
                                               +----
--
  It doesn't bother me that people say things like
   "you'll never get anywhere with this attitude".
   In a few decades, it will make a good paragraph
      in my biography. You know, for a laugh.
--
http://www.laranja.org/                mailto:lalo@laranja.org
         pgp key: http://www.laranja.org/pessoal/pgp

Brazil of Darkness (RPG)      ---       http://www.BroDar.org/