[Zope-CMF] A modest proposal to add a Unique ID to all conten t/folder objects in CMF

sean.upton@uniontrib.com sean.upton@uniontrib.com
Thu, 22 Aug 2002 08:45:44 -0700


Hopefully I can clarify a bit; when I say 'references' - I don't mean object
references in as much as stored id strings in other items (for example, I
get XML news feeds from AP that reference IDs of related images).  For
example, my content types have an extended Metadata mixin that provides the
following function:

queryRelations(**filter) : tuple containing (id,type,model,label)

where:
- 'id' is the unique identifier string, be it a hash, slug, path, or
whatever
- 'type' is the CMF Type of the object
- 'model' is the refinement model (in a Dublin Core Relation sense).  My
setup has 4 possible refinements:
	- "References" (conceptual relationship)
		- Inverse: "is referenced by"
	- "has part" (compositional relationship)
		- Inverse: "is part of"
	(note that the inverse relationships are the result of a reverse
lookup via Catalog, while the direct ones are stored, either
	in the content type or in a registry).
- 'label' is a unique descriptive string for a class of relationship:
"Yesterday's News Coverage" or "Statistics" or "Related Location"

With this system, my API does not care where I get the relation data from:
it could be looked up in a tool or registry; it could come from the content
items themselves, but it is the job of the content item to provide the
interface to look them up.  But no matter how I store the relation data, I
need consistent and unique ids.  

I have to be able to support an id change for an object, partially because I
work with data that comes from systems or people that have already created
an ID for the content before it is imported into my system.  Of course,
measures have to be taken to account for the slim chance of duplication and
dealing with a copy.  Because some of the relationships between objects are
not simply metadata but part of the core structure of a document, I have to
be able to store textual references to a globally unique id.  In order to be
flexible on where the data is stored, I have to be flexible on how the
relationship is maintained if an object changes id, however.  If the item is
moved or renamed from 'foo' to 'bar' (and the path obviously changes), I
have to be able to have a tool to go out and change objects references to
'foo' in their metadata or document structure to 'bar.'  I know in the ideal
world, this is the job of a notify/subscribe event channel system, but it
seems easier when you might be dealing with half a dozen references and
generally few writes to simply have a tool do this work on behalf of the
content formerly known as 'foo.'  convertReferences('foo','bar') seems like
the right way to do things (though, "references" are simply unique strings,
nothing more).  I can't think of any alternative to a little housekeeping,
unless it is assumed it is the job of a content author to clean up dead
links (in a related sense, the www has HTTP 302 and 303 messages as an
approach for dealing with the problem of moving content, dead links, and
changing paths; an approach like this could potentially solve the problem,
but I would rather solve the problem once at production time than millions
of times on delivery of the object; either way, you need to do housekeeping,
or have dead links/relations).

I cannot see any other alternative to dealing with the absolute requirement
that I be able to change an item's Id.  So, in theory, this takes care of
moving and renaming, what about copying?  My take is that on copy, there
will be an absolute need to auto-generate a hash or append either a
sequential or random number to the end of the existing id string, and then,
of course use the same machinery used to fix other object's links to you via
a single call to a tool.  The problem with this, however, is the question
"shouldn't the items pointing relation/reference to you still point to the
old copy."  I'm not sure what the answer is at this point, however, I am
likely to say: it's related to both, and it is up to the system designer to
change workflow state on the old object so it is not visible, and the
skins/tools using the references passed to it making sure that they filter
out non-visible items.

So suppose that item '20020821-guilty-on-all-counts.1' points to
'20020821-jury-verdict-in-at-11AM.1', and you decide to clone/copy
'20020821-jury-verdict-in-at-11AM.1' to another folder (or part of your
workflow does this for you).  In this case, it would need to have a new id;
I think I would likely generate a 3-digit sequence number to append to the
end, so the copy would be called '20020821-jury-verdict-in-at-11AM.1-001'
instead.  The next thing to do is to make sure that
'20020821-guilty-on-all-counts.1' and all other items that refer to the
original point to both the original and the new.  I think I would add
another method to the said tool to augment the function of
convertReferences(oldid,newid): it would be called
cloneReferences('20020821-jury-verdict-in-at-11AM.1','20020821-jury-verdict-
in-at-11AM.1-001'), and there would need to be a hook within the content
items to allow the tool to do this to them as well, especially since my
current thinking has me believing that content items should manage their own
relations (if they store them in their own attributes or even if they defer
to some other object or registry to help).

I would like to be able to support the use of a path as a textual
'reference' - but only when I have to; I would only support this when
absolutely necessary, for example, in the case of pointing to non-CMF Zope
objects, or storing a specific path to a page template to provide an
item-by-item override of a skin method, and simply for backward
compatibility.  However, it seems likely that paths would be rarely used,
and only when necessary, and only when there was no other alternative.

I'm still working on trying to formulate these ideas, so forgive me if this
seems like a mess at this point.

Sean

-----Original Message-----
From: Tim Hoffman [mailto:timhoffman@cams.wa.gov.au]
Sent: Wednesday, August 21, 2002 6:46 PM
To: sean.upton@uniontrib.com
Cc: Zope-CMF@zope.org
Subject: RE: [Zope-CMF] A modest proposal to add a Unique ID to all
conten t/folder objects in CMF


Hi Sean


On Thu, 2002-08-22 at 02:16, sean.upton@uniontrib.com wrote:
> A few thoughts:
> 
> When storing a relation to content by storing an ID, one should be
flexible
> in storing IDs generated from the following sources:

Yeah, I think your right, I do like the ability to pass an externally
generated UniqueID, except what happens if the object is cloned.

if for instance we have created an object with the externally provided
Id '20020821-jury-verdict-in-at-11AM.1' if I copy the resultant object
what happens to the id, it will be either

no longer unique

or bare little or resemblance to the originally passed id.

I would suggest that maybe if the uniqueid is supplied rather than
generated, that you would then have to pass as an argument a method or
class to be called on clone, or raise an exception and not allow the
object to be copied. 

> 
> - Slug (Manually named unique ID string)
> 	- Example: '20020821-jury-verdict-in-at-11AM.1'
> 	- Likely to be unique no matter what path it is in, unless multiple
> versions of same object from paste and/or workflow move
> - Path To Object (not great, but should be supported)
> 	- Both inside Zope (i.e. '/cmfsite/myPortalFolder/foo-bar-123.jpg')
> and outside Zope (i.e. 'c:\My Documents\foo-bar-123.jpg')
> 	- Brittle, but common; should be supported in case alternate id
> generator is not available/appropriate

What's the real different between slug or path to object, I view them as
both instances of an externally supplied Unique Id

> - Hash or Digest (likely guaranteed to be unique)
> 	- If no slug exists, and we are within a system that can manage
> translation of a hash to an object reference, this is good
> 	- likely doesn't move beyond Zope
> 

Agreed

> My current thinking in a customized CMF system implementation that I am
> currently working on: Identifier() should output the path, but a method
> called, say, getUniqueId() should output one of these; which doesn't
matter;

I am not sure I am keen on this. My own goal was to make sure I had a
guranteed unique id. If you want to store the path to some external file
I would add another property to the object. If you where to rely on the
unique id being a path pointer, I feel you introduce all sorts of 
dependancies on synchronisation with the outside world, which aren't
part of uniquely identifying a content object inside zope.


> a hash should only be generated at content creation if a slug doesn't
exist,
> or at content copy if there is another item on system with the same slug;
> alternately, the unique ID should be editable, so that if a hash is
> generated, but the content author doesn't like it (they want to use a more
> descriptive, but likely still unique slug, they can).  This presents the
> challenge of finding and fixing references to the old id referenced in
other
> objects, but this could be done by simply having a mechanism to find and
fix
> this at the time of a change of id.

This is what I would like to avoid. For my own part I want the solution
as simple as possible and not require any housekeeping data cleansing,
system integrity checking.

My proposal doesn't actually include references. I would like to keep
them out of it, as they may or may not be used, and in many different
ways, that we can't forsee. For instance using them in body text of a
wiki (how about new syntax for the wiki "text displayed":{someobjects
uniqueid}  This gives you a site wide link to an object which can move,
but the reference is in body text.

  A mixin class for content should
> provide:
> - An attribute to stor a unique id string
> - A getUniqueId() method to get it
> - a setUniqueId() method to set it, and call a tool to fix references in
> refering objects
> - convertReferencesTo(oldid,newid)
> 

convertReferences is exactly what I would like to avoid at this point.
KIS is my own view. 

> A lookup mechanism/tool should determine if the id is a path, hash, or
slug,
> and broker an object reference for an object passed an id.  This tool
should
> also provide hooks for a content object to ask it to assist in changing
> other object's references to it:
> - convertAllReferences(oldid,newid)
> 	"""
> 	query to find all objects, then call
> 	obj.convertReferencesTo(oldid,newid) for each object
> 	"""
> - getObjectById(id)
> 	"""get obj ref and return, passed a slug/hash/path"""
> 
> My thoughts on relations are that some content objects should be able to
> store their own relations, but only in one direction (i.e. a DCMES
> refinement sense of "references" vs. the indirect "is referenced by").
> Also, relations should be able to be externally stored in the system
instead
> of in the content item; perhaps a tool that manages unique ids should also
> assist with relations.
> - getIndirectReferencesTo(id)
> 	"""get a list of all objects referring to the one passed here; uses
> catalog"""
> - addRelation(id1,id2,'Text Label For Relation','Relation Model (in the
> DCMES relation element refinement sense)')
> - delRelation(id1,id2,'Text Label For Relation')
> - queryManagedRelations(id)
> 
> Sorry if my ideas are all over the place here... Thoughts?
> 

I quite like may of the things you propose in theory, but I would like
to see those sorts of things in Zope 3. I would actually just like to
see UniqueID's in core CMF, and soon, and my code does work (still need
some unit tests ;-), and could be in to the next version of CMF with
little or no impact, and now new big complex tools or big complexities
added to CMF.

Regards

Tim


> Sean
> 
> -----Original Message-----
> From: Tim Hoffman [mailto:timhoffman@cams.wa.gov.au]
> Sent: Tuesday, August 20, 2002 7:32 PM
> To: Zope-CMF@zope.org
> Subject: [Zope-CMF] A modest proposal to add a Unique ID to all
> content/folder objects in CMF
> 
> 
> Hi
> 
> I would like to elicit some discussion on the possibility of adding some
> new functionality to CMFCore.
> 
> I would like to call it a UniqueZid for want of a better name. Basically
> it would be a new mixin class (see below) which would be added to
> CMFCore.PortalFolder, and CMFCore.PortalContent It would necessitate all
> classes that Subclass PortalContent to call PortalContent.__init__(self)
> in their init method to initialize the UniqueID. In addition we would
> need to to the __init__ method to in PortalContent (and add an __init__ 
> method to PortalFolder) to call UniqueZid.__init__(self)
> 
> This would give all content objects (folders, etc...) a uniqeid that
> would be at a minimum unique within a CMF site and possibly unique
> accross sites. That would be guaranteed to remain the same for the life
> of the object. If the object is cloned then a new UniqueId would be
> generated for the new object.
> 
> There are some real advantages to this (many discussion in the cmf-zope
> list have talked about how to/not to use data_record_id_, paths etc as
> unique identifiers) however all of these are transitory in nature and
> can't be relied upon, data_record_id changes all the time, and the path
> of an object will change the minute you move it, though it is the same
> object.
> 
> By putting the a new index and metadata column in the portal_catalog you
> can retrieve any specific object without needing to know it's location,
> or having to worry that it's location might change.
> 
> I have used this capability to perform a similiar function to the new
> CMFWorkspaces package (which is basically like a collection of
> favourites) however links/relationships created by UniqueZid will still
> be valid if the object moves (not the case with CMFWorkspaces or
> favourites)
> 
> In addition if you create a property on an object such as
> "related_objects" and it contains a list of UnqueZid's. You can also do
> fairly efficient reverse lookups. ie if you add related_objects to the
> portal_catalog, you can then easily find out "what objects relate/point
> to this object" 
> 
> I am probably missing something major, but I have been using this
> approach extensively on a couple of live sites to really good effect.
> 
> My approach before however was to monkey patch DefaultDublinCoreImpl
> but I see a lot of value in this being added to the core of CMF.
> 
> What do people think?
> 
> Regards
> 
> Tim 
> 
> P.S. Below is a first cut at the UniqueZid mixin class, plus a simple 
> Pythonscript to retrieve an object by it's UniqueZid  
> 
> UniqueZid.py
> 
> from Globals import InitializeClass,Persistent
> import sha
> from time import asctime,gmtime,clock
> from Acquisition import aq_base
> 
> class UniqueZid(Persistent):
>     """
>         Mix-in class which provides a unique id for the object,
>         and will remain Unique if this object is cloned
>         if you are concerned about how unique the hash digest
>         will be, add some additional information by 
>         way of the hash_string argument. If you want to ensure
>         Uniqueness across sites include a prefix (maybe)
>         The prefix is preserved
>     """
>     
>     def __init__( self,hash_string='',prefix='' ):
>         self._zid = self._generateId(hash_string,prefix)
>         self._prefix = prefix
> 
>     def _generateId(self,hash_string,prefix):
>         seed_string = prefix + hash_string + asctime(gmtime()) +
> str(clock())  
>         return prefix+sha.new(seed_string).hexdigest() 
>         
> 
>     def getZid(self):
>         ''' return unique id '''
>         return self._zid
> 
>     def ZID(self):
>         ''' return unique id named nicely for 
>             portal_catalog index names
>         '''
>         return self.getZid()
> 
>     def manage_afterClone(self, item):
>         self._zid = self._generateId(self.ZID(),self._prefix)
>         for object in item.objectValues():
>             if hasattr(object, 'manage_afterClone'):
>                 object.manage_afterClone(object)
>                 
> InitializeClass(UniqueZid)
> 
> 
> 
> 
> getObjectByZid  python script
> 
> #parameter = zid
> 
> result=context.portal_catalog(ZID=zid)
> if len(result):
>      if len(result) > 1:
>          raise LookupError,"More than one object has the same ZID!"
>      result = result[0]
>      object = result.getObject(result.data_record_id_)
>      return object.view()
> else:
>   return None
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Zope-CMF maillist  -  Zope-CMF@zope.org
> http://lists.zope.org/mailman/listinfo/zope-cmf
> 
> See http://collector.zope.org/CMF for bug reports and feature requests