[Zope] Summarization and Zope

Chris Beaumont cbeaumon@msri.org
Fri, 04 May 2001 11:55:33 -0700


Hi Assum,

	It's 'HTML::Summary'


http://search.cpan.org/doc/TGROSE/HTML-Summary-0.017/lib/HTML/Summary.pm


(I'm enclosing the info for it at the bottom of the page) 

What I was thinking is that it might be possible to parse a given
rendered page just before actually delivering it in the RESPONSE object,
and insert a short summary in the meta-description tag.. on the fly..  I
also wouldnt be suprised if the PERL summarization module might also be
able to be coaxed to give us keywords.. (the 'fork words' in the 'tree')

There are two guys at a company called Glucose that make a similar
tree-summarization product for the Mac, (it works amazingly well) and I
suggested a similar thing for them around two years ago (to be used with
static pages..) and they actually turned my idea into a commercial
product.. Unfortunately, it is Mac only and cant be used with Zope.. I
didnt ask them for any money, but they did put my name in the about box
*laugh*. Of course now that I'm building sites in Zope, it's not really
practical for me to use anymore. But I really miss that functionality..
I work on a scientific site and as we move more and more of it to Zope,
the ability to generate a good meta-tag summary automatically would be a godsend.

This insert doesn't sound like it would be difficult.. I'm doing a
similar thing with titles for some of the pages on the Zope part of my
site.. 

I'm just not enough of a PERL hacker to know the best way to go about
creating the hooks between Zope and PERL..

-Chris

Ausum wrote:
> 
> Chris, Would you please specify what is the summarization module you've known
> about? I've been at CPAN, but no result is given to a "summarization" keyword
> search. I'm glad you're as interested as me.
> 
> It's cool to know we can count on you, Andy. I haven't tried Perl for Zope, but
> I will this very afternoon.
> 
> Ausum
> 
> Andy McKay wrote:
> >
> > > Maybe you could hook the PERL module up to Zope somehow..
> >
> > Thats easy, call it straight from Zope or wrap it in a product. If you need
> > help on this drop me a line.
> >
> > Cheers.
> > --
> >   Andy McKay.

_________________________cut here____________________________

> NAME 
> 
> HTML::Summary - module for generating a summary from a web page. 
> 
> 
> 
> SYNOPSIS 
> 
>     use HTML::Summary;
>     use HTML::TreeBuilder;
> 
> 
>     my $tree = new HTML::TreeBuilder;
>     $tree->parse( $document );
> 
> 
>     my $summarizer = new HTML::Summary(
>         LENGTH      => 200,
>         USE_META    => 1,
>     );
> 
> 
>     $summary = $summarizer->generate( $tree );
>     $summarizer->option( 'USE_META' => 1 );
>     $length = $summarizer->option( 'LENGTH' );
>     if ( $summarizer->meta_used( ) )
>     {
>         do something
>     }
> 
> 
> 
> 
> DESCRIPTION 
> 
> The HTML::Summary module produces summaries from the textual content of web pages. It does so using the location heuristic, which determines the value of a given sentence based on its
> position and status within the document; for example, headings, section titles and opening paragraph sentences may be favoured over other textual content. A LENGTH option can be used
> to restrict the length of the summary produced. 
> 
> 
> 
> CONSTRUCTOR 
> 
> new( $attr1 => $value1 [, $attr2 => $value2 ] ) 
> 
> Possible attributes are: 
> 
> VERBOSE 
>       Generate verbose messages to STDERR. 
> 
> LENGTH 
>       Maximum length of summary (in bytes). Default is 500. 
> 
> USE_META 
>       Flag to tell summarizer whether to use the content of the <META> tag in the page header, if one is present, instead of generating a summary from the body text. Note that if the
>       USE_META flag is set, this overrides the LENGTH flag - in other words, the summary provided by the <META> tag is returned in full, even if it is greater than LENGTH bytes.
>       Default is 0 (no). 
> 
>     my $summarizer = new HTML::Summary LENGTH => 200;
> 
> 
> 
> 
> METHODS 
> 
> option( ) 
> 
> Get / set HTML::Summary configuration options. 
> 
>     my $length = $summarizer->option( 'LENGTH' );
>     $summarizer->option( 'USE_META' => 1 );
> 
> 
> generate( $tree ) 
> 
> Takes an HTML::Element object, and generates a summary from it. 
> 
>     my $tree = new HTML::TreeBuilder;
>     $tree->parse( $document );
>     my $summary = $summarizer->generate( $tree );
> 
> 
> meta_used( ) 
> 
> Returns 1 if the META tag description was used to generate the summary. 
> 
>     if ( $summarizer->meta_used() )
>     {
>         # do something ...
>     }
> 
> 
> 
> 
> SEE ALSO 
> 
>     HTML::TreeBuilder
>     Text::Sentence
>     Lingua::JA::Jcode
>     Lingua::JA::Jtruncate
> 
> 
> 
> 
> AUTHORS 
> 
>     Ave Wrigley <wrigley@cre.canon.co.uk>
>     Tony Rose <tgr@cre.canon.co.uk>
>     Neil Bowers <neilb@cre.canon.co.uk>
> 
> 
> 
> 
> COPYRIGHT 
> 
> Copyright (c) 1997 Canon Research Centre Europe (CRE). All rights reserved. This script and any associated documentation or files cannot be distributed outside of CRE without express
> prior permission from CRE. 
>