[Checkins] SVN: topia.termextract/trunk/s * Add example.txt to the documentation.

Stephan Richter srichter at gmail.com
Sat May 30 12:10:45 EDT 2009


Log message for revision 100560:
  * Add example.txt to the documentation.
  
  * Improve text in example.txt.
  

Changed:
  U   topia.termextract/trunk/setup.py
  U   topia.termextract/trunk/src/topia/termextract/example.txt

-=-
Modified: topia.termextract/trunk/setup.py
===================================================================
--- topia.termextract/trunk/setup.py	2009-05-30 15:55:46 UTC (rev 100559)
+++ topia.termextract/trunk/setup.py	2009-05-30 16:10:45 UTC (rev 100560)
@@ -35,6 +35,8 @@
         + '\n' +
         read('src', 'topia', 'termextract', 'README.txt')
         + '\n\n' +
+        read('src', 'topia', 'termextract', 'example.txt')
+        + '\n\n' +
         read('CHANGES.txt')
         ),
     license = "ZPL 2.1",

Modified: topia.termextract/trunk/src/topia/termextract/example.txt
===================================================================
--- topia.termextract/trunk/src/topia/termextract/example.txt	2009-05-30 15:55:46 UTC (rev 100559)
+++ topia.termextract/trunk/src/topia/termextract/example.txt	2009-05-30 16:10:45 UTC (rev 100560)
@@ -1,6 +1,6 @@
-==============
-A News Article
-==============
+===========================
+An Exmaple - A News Article
+===========================
 
 This document provides a simple example of extracting the terms of a BBC
 article from May 29, 2009. We will use several term extraction tools to
@@ -348,15 +348,21 @@
   area            NN        area
   .               SENT      .
 
+As you can see, the identification of TreeTagger is pretty good, but the
+output would need some analysis to produce a useful set of terms. Furthermore,
+TreeTagger is not free for commercial use.
 
-Topia POS Tag
--------------
+Topia's Term Extractor
+----------------------
 
-Topia POS Tag tries to produce results somewhere between a simple tagger like
-TreeTagger and Yahoo Keyword Extraction. We try to achieve that by first using
-a POS Tagger followed by applying a simple term constructor and relevance
-calculation,
+Topia's Term Extractor tries to produce results somewhere between a POS
+tagger like TreeTagger and Yahoo Keyword Extraction.
 
+Since we are only interested in nouns, a very simple POS tagging algorithm can
+be deployed, which will provide good results most of the time. We then use
+some simple statistics and linguistics to produce a narrow but strong list of
+terms for the content.
+
   >>> from topia.termextract import extract
   >>> extractor = extract.TermExtractor()
 



More information about the Checkins mailing list