[Zope-PTK] DublinCore revisited

Tres Seaver tseaver@digicool.com
Mon, 14 Aug 2000 17:51:18 -0400 (EDT)


I have been reviewing the DublinCore metadata specs
(http://purl.org/dc) and see several changes we should make to
support the DublinCore within the PTK (I'd have made this
proposal on the PTK wiki page for DublinCore, but www.zope.org
is sick tonight ): ).

First, we should be clear that the Dublin Core Initiative (DCI)
is about the *kinds* of meta-data which are accessible, and does
not mandate any particular "spelling" within an application (the
specific DCI names become more important at the "boundaries" of
an application, where other applications interact with it).

That being said, we should continue to provide facilities for
querying individual metadata elements, while recognizing that the
most widespread consumers will want *all available* metadata for
a given piece of content (RDF syndication, <meta...> tags, etc.)

Here is my quick rundown on the DCI elements and the qualifiers
appropriate to them:

  Title -- the standard Zope 'title' attribute; we should look
    at making it mandatory for all PortalContent derivatives.

  Creator -- where possible, this should be one or more full
    names, of either persons or organizations.  The current
    implementation finds the first user in the list returned
    by 'get_local_roles' who has the "Owner" role;  userids are
    not considered appropriate for this field by the DCI.

  Subject -- this is supposed to be drawn from a "controlled"
    list of keywords (e.g., selected from a multi-select list
    used across the whole site).

  Description -- a short summary, an abstract, or a
    table-of-contents are all considered acceptable.  We might
    look at making this required, as well, at least for some
    kinds of content.

  Publisher -- a site-wide property, should be done through
    acquisition (do I smell a 'portal_metadata' tool about to
    appear?)  Again, this is supposed to be a "formal" name.

  Contributor -- used to convey others *besides* the Creator who
    have contributed to the document (the current implementation
    aliases 'Creator', which is not what DCI intends).

  Date -- this one has "modifiers", of which the approved set is:
    'Created', 'Valid', 'Available', 'Issued', and 'Modified'.
    I propose extending the interface to include CreationDate(),
    EffectiveDate(), ExpirationDate(), and ModificationDate().
    The current Date() could just return the CreationDate(), while
    the DCI 'Valid' and 'Available' would be ranges derived from
    EffectiveDate() and ExpirationDate().

  Type -- like the Zope 'meta_type', this is the main "conceptual"
    classification; 'meta_type' is often spelled identically to
    the class, which makes it less appropriate for the DCI usage.

  Format -- the kind of physical representation, e.g., "text/html".

  Identifier -- should be the fully-qualified URL of the document
    (the current implementation returns the object's id, which is
    only required to be unique within its container).

  Language -- "en-us", "pt-br", "de", etc.  Should be set at
    creation, with an appropriate default (and a picklist of
    values).

  Source -- the "original" from which a piece of content is
    derived.  I'd like to ignore this one.

  Relation -- more "relationships" to other documents.  Again,
    I'd like to ignore it (ZopeStudio and other such tools need
    this, however, to build site maps).

  Coverage -- geographic/chronological/jurisdictional scope.
    Again, ignore.

  Rights -- copyright and other IP information related to the
    document.  Most portals *should* care about this:  witness
    the brouhaha on Slashdot over the compilation of the
    "Hellmouth" postings into a book.

I can see the following interface as being useful for a new
'portal_metadata' tool:

  def validateMetadata( self, content ):
      """
          Enforce portal-wide policies about DCI, e.g., requiring
          non-empty title/description, etc.  This method would be
          called by the framework immediately before adding a
          piece of content to a folder;  it could also perform
          some other housekeeping (setting unsupplied values to
          defaults, etc.)
      """

  def getFullName( self, userid ):
      """
          Convert an internal userid to a "formal" name, if
          possible, perhaps using the 'portal_membership' tool.
      """

  def listAllowedSubjects( self, meta_type=None ):
      """
          List allowed keywords for a given meta_type, or all
          possible keywords if none supplied.
      """

  def getPublisher( self ):
      """
          Return the "formal" name of the publisher of the
          portal.
      """

  def getType( self, meta_type ):
      """
          Map the Zope 'meta_type' to a DCI-appropriate one (the
          default implementation could just return meta_type.)
      """

  def listFormats( self, meta_type=None ):
      """
          List the allowed 'Content-type' values for a particular
          meta_type, or all possible formats if none supplied.
      """

  def listLanguages( self, meta_type=None ):
      """
          List the allowed language values.
      """

  def listCopyrightTypes( self, meta_type=None ):
      """
          List the allowed values for a "Copyright type:"
          selection list;  this gets especially important where
          syndication is involved.
      """

I would also like to modify the existing DublinCore.py to make
the class a pure interface, as follows (the current
implementation would be folded into a new DemoPortalContent
base class in the PTKDemo product):

  class DublinCore:

      #
      #   Existing DublinCore methdods, converted to interface.
      #
      def Title( self ):
          """
              E.g., 'self.title'.
          """

      def Creator( self ):
          """
              E.g.,
                'portal_metadata.getFullName( self.getOwner() )'.
          """

      def Contributor( self ):
          """
              Return any additional contributors to the content.
               (default could be '[]').
          """

      def Date( self ):
          """
              Normally the "publication date", e.g.,
               'self.EffectiveDate()'.
          """

      def Description( self ):
          """
              Return an appropriate summary/abstract/table of
              contents.
          """

      def Subject( self ):
          """
              Return the keywords assigned by the user.
          """

      def Type( self ):
          """
              E.g., 'portal_metadata.getType( self.meta_type )'
          """

      def Identifier( self ):
          """
              E.g., 'self.getPhysicalPath()'.
          """

      #
      #   New DublinCore methdods, designed to support full
      #   queries, as well as more useful date handling.
      #
      def CreationDate( self ):
          """
              Return the date the content was created.
          """

      def EffectiveDate( self ):
          """
              Return a user-supplied effective date, or
              creation date if none supplied.
          """

      def ExpirationDate( self ):
          """
              Return a user-supplied expiration, or None.
          """

      def ModificationDate( self ):
          """
              E.g., 'self.bobobase_modification_time()'
          """
    
      def asRDF( self ):
          """
              Render the content object's DCI metadata as RDF.
          """
    
      def asMetaTags( self ):
          """
              Render the content object's DCI metadata as HTML
              <meta...> tags.
          """

-- 
===============================================================
Tres Seaver                                tseaver@digicool.com
Digital Creations     "Zope Dealers"       http://www.zope.org