[Zope-CMF] Pythonish Questions

Tres Seaver tseaver@digicool.com
Tue, 01 May 2001 07:50:58 -0400


Jon Edwards wrote:
> 
> Hi all, I've started digging into Python, and I have a couple
> of questions from looking at the CMF source code -
> 
> (N.B. I plumped for Boa Constructor in the end -
> http://boa-constructor.sourceforge.net/ - seems an excellent
> open-source Python IDE and wxPython GUI Builder, written in
> Python, with a lot of Zope functionality already built-in!)
> 
> 1. SearchableText method (of documents, news-items, etc) - If
> the text_format is HTML, could this be made to stip out the
> HTML tags? Something along the lines of...
> 
>  def SearchableText(self):
>         "text for indexing"
>         if self.text_format == 'html':
>            self.text = strip_htmltags(text)
>         return "%s %s %s" % (self.title, self.description,
>                              self.text)
> 
> (my syntax is probably wrong, but you get the idea?) Is there
> a function somewhere equivalent to 'strip_htmltags'?

Chris Withers has recently posted a method he uses for stripping
HTML in Squishdot;  I would search the list archives for it.

> I guess this is something DC would need to do, as if I change
> the code myself it will be overwritten when I upgrade?

One way to accomplish this would be to derive a new class/ZClass
from CMFDefault.Document, and override just the SearchableText method.


> This would keep the Catalog tidier (no HTML bits to confuse
> search results),

That is a reasonable goal.

> and would mean SearchableText could be inserted into a
> document's HTML metadata headers (to help search-engine
> optimisation), without the risk of breaking things by
> including HTML tags!

Putting SearchableText in the '<meta>' tags doesn't make much
sense to me -- the headers are supposed to be for "meaningful
categorization", since the search engine can already index the
body of the page.  Imagine doubling a 100k page by replicating
its contents in the headers?
 
> 2. On a related note, I noticed the 'getMetaDataHeaders'
> method in DublinCore, which would seem to be ideal for this
> - just append SearchableText to the 'Description'. Would this
> break anything else? Is there a way I can patch this change
> in my copy, without it being overwritten when I upgrade?
> (Sorry for the newbie question, this is probably covered in
> documentation somewhere, but I couldnt find it!)

See above (but again, I don't advise the specific change
you are contemplating).

> 3. Also in DublinCore, there is a 'Contributors' property.
> This would seem very useful for CompositeContent objects made
> up of docs contributed by several different people - the
> Creator would be the editor/reviewer responsible for the
> CompositeContent object, the Contributors would be a list of
> the individual authors of the documents. But next to it there
> is a comment saying "# XXX: Fixme!". Is it safe to use this
> property?

The comment is a fossil, I think, from a time when the underlying attribute was not being populated.  The methods are
fine to use, and your proposal is a natural extension to it.

> 4. I'm starting to wrap my head round the CompositeContent
> issue, does anybody have any code they wouldn't mind "sharing
> with the group" to get me started? Or pointers to existing
> code that does similar things? Is there a
> SIG (or Zope equivalent) that's working on this?

Composites are one of the major features we plan to add for
the next release of the CMF.  This list is the main discussion
point for them, so far;  currently, DC plans to generalize from
the composites we have built for several consulting gigs.

Tres.
-- 
===============================================================
Tres Seaver                                tseaver@digicool.com
Digital Creations     "Zope Dealers"       http://www.zope.org