[Zope-CMF] Dublin Core Subject Qualifier Implementation

sean.upton@uniontrib.com sean.upton@uniontrib.com
Tue, 19 Feb 2002 12:46:11 -0800


Hey everybody,
I am looking at implementing Dublin Core Qualifiers for Subject metadata as
a means of expressing subjects within multiple controlled and standardized
vocabularies (namely, IPTC subjects for news and sports stuff, and NAICS or
SIC codes for Business information), in addition to supporting plain-text
subject vocabularies as well.  Is there any established pattern or syntax
for dealing with subject codes this way in the CMF?  I haven't found
anything, so I have been thinking about a solution... my thoughts are below.

The Dublin Core Qualifiers spec has several recommended element encoding
schemes for LC and medical subjects, but nothing excludes other
industry-standard subject vocabularies, such as IPTC (news/media, worldwide)
or NAICS (used by North American governments, business/economic news, and
yellow pages), or market names (stock tickers).
	http://www.dublincore.org/documents/dcmes-qualifiers/#subject
	http://www.iptc.org/
	http://www.census.gov/epcd/www/naics.html

My first hunch is that the best way to convey a namespace/qualifier for a
subject code system is with a colon in the text, separating the vocabulary
"NAME" (in dcmes-qualifers terms).  My second hunch is that I need to create
a subject lookup tool that performs lookups for "human-readable"
counterparts for codes, so that codes with qualifiers can get a description
that makes sense to a content user.  I also think such a framework might be
useful for content creators if the user interface for metadata entry enabled
efficient lookup with these codes (the biggest UI issue is that number of
these codes may be in the order of thousands, something like a popup
search/browse dialog might be appropriate).

Example lookup/translation input/output:

	NAICS:511110  --> "Newspaper Publishers"
	IPTC:01016000 --> "Television"
	NASDAQ:MSFT --> "Microsoft Corporation"
	Media Companies --> "Media Companies" (verbatim translation of
unqualified text)

This tool should support internationalization (or is it localization?) of
description lookup, because these vocabularies are often defined by
multi-national organizations (thus multi-lingual lookup tables might exist,
for example IPTC supports most Eurpoean languages, Turkish, and Arabic);
this isn't to say that one need implement every language a vocabulary
supports to satisfy this, but that the interface for this tool should
support a language encoding parameter for this purpose, so that a
multi-lingual site can support multiple languages with one vocabulary
(SignOnSanDiego publishes content in English and some Spanish).

In use of this tool, there would still be interfacing issues to make this
work with the metadata tool and content types, both in terms of suporting a
user interface for massive amounts of subject codes, as well as determining
when to display the code and when to display the lookup description...

I'd be interested to see what people think about this.  I wrote some
interface documentation, which is pasted below that might help in explaining
my idea.  Thoughts?

Thanks,
Sean

#####################################
##################################### 

import Interface

class portal_subjectlookup(Interface.Base):
      """
        Interface for registry of subject code qualifier
        vocabularies.  Among other things that a tool 
        implementing this interface should do is provides the
        ability to query with a code, language, and
        vocabulary, and get descriptions.
      """

      def getDescriptionFromCode(code, vocabulary=None, language='en-US'):
          """
            Lookup code in registry specified by vocabulary for language 
            specified in language.

            Pre-condition:  code is a string object and is not None
            Post-condition: a string is returned with a human-readable
                            text description (string) for a code in
                            the language specified, if available.

                            If a registry implementation in the tool is not
                            available in the language specified, a default 
                            language should be used.

                            If no viable option can be found in lookup, 
                            method should return None.

            Notes: sorry about the ethnocentrism in the language default.
          """

       def findCodeByKeyword(query, vocabulary=None, language='en-US'):
          """
            Used primarily by content producers, or agents on their behalf.


            This method is used to find a correct code, for a piece of
content
            when the code is unknown, but the subject matter is.  This
            allows a query, which can be either a single string keyword, or
a
            sequence of keyword strings.  The query is an "or" query, so
that if
            query == ['foo','bar'] topic codes with descriptions matching
both
            should be returned.

            Pre-condition:  query is a string or a sequence of strings and
                            is not None.  If query is a sequence, a query
                            will be performed for all terms as specified
                            above.  If vocabulary is specified, only search
                            that vocabulary, otherwise a 'search all' is
                            assumed.

            Assumptions:    it is assumed that the query that is passed to
this
                            method should match with a wildcard on the end 
                            of each keyword, so that a query of
                            ['bio','tech','medi'] would find biotechnology,
                            technology, medicine, technical, medical, etc...

            Post-condition: a sequence of matches is returned, where a match
                            is a tuple of vocabulary, code, and description
                            in the language of choice.

                            If a registry implementation in the tool is not
                            available in the language specified, a default 
                            language should be used.

                            If no viable option can be found in lookup, 
                            method should return None.           
          """

       def listAllCodes(vocabulary=None, language='en-US'):
          """
            This method lists all entries in lookup tables for subject
            vocabulary codes, either globally, or within a particular
            vocabulary.  Output is similar to findCodeByKeyword()...
            
            Assumptions:    If vocabulary is not None, then search 
                            globally across all vocabularies present
                            in this tool.
                           
            Post-Condition: a sequence of entries is returned, where an
                            entry is a tuple of vocabulary, code, and
                            description in the language of choice.

                            If a registry implementation in the tool is not
                            available in the language specified, a default 
                            language should be used.

                            If no viable option can be found in lookup, 
                            method should return None.
          """
       def getIconPathForSubject(code, vocabulary=None):
          """
            Attempts to find an icon path registered for a code/vocabulary
            combo.  Since vocab is optional, this could potentially need
            to look through the registry for entries in all vocabularies.

            Returns a list of "wrapped-icons" where a wrapped-icon is a 
            tuple containing the icon width, icon height, and icon path
            as a list; example: 
            [ (32,32,['path','to','images','subj32.png']),
              (16,16,['path','to','images','tiny','subj16.png']) ]
          """

#####################################
##################################### 


=========================
Sean Upton
Site Technology Supervisor
Development & Integration
SignOnSanDiego.com
The San Diego Union-Tribune
619.718.5241
sean.upton@uniontrib.com
=========================