[Zope] [ANN] TextIndexNG 1.05final released

Andreas Jung andreas@andreas-jung.com
Sun, 13 Oct 2002 11:10:27 +0200


I am pleased to announce the release of

                          TextIndexNG 1.05 FINAL


TextIndexNG is a new pluggable index for the ZCatalog and is the most
feature complete solution for fulltext inexing  under Zope. TextIndexNG
enhances the fulltext indexing capabilities of Zope by providing the
following features:

    * support for document converters (HTML, PDF, WinWord, PowerPoint,
      Postscript). Custom converters can be easily added

    * stemmer support for 12 languages

    * optional support for right truncation

    * similarity search (soundex, metaphone support) (for English)

    * NEAR search

    * phrase search

    * pluggable query parsers (two parsers included)

    * stop words support

    * new test tab for interactive testing

    * faster than Zopes old TextIndex

    * full unicode support (new)

    * normalization support (new)

    * new similarity algorithm: double metaphone (new)

    * new TXNGSplitter

    * new vocabulary browser

Changes:

    * added full wildcard support for CLLexicon and StandardLexicon

    * rewrote Stemmer module (now fully unicode compliant)

    * unittests code cleanup

    * query evaluation refactored

    * Parser API changed to return a parse tree instead of a Python
      expression

    * new parse tree evaluator added

    * PyQueryParser: now accepts a minus sign as prefix of a word to
      indicate NOT. Searching for "foo -bar" will be recognized as "foo AND 
NOT
      bar". In      addition the syntax for "ANDNOT" has been changed to 
"AND NOT".

    * stopword handling through registry

    * added double metaphone algorithm for similarity search

    * Splitter handling changed: The new TXNGSplitter has been added. It
      supports both strings and unicode strings and supercedes the
      functionalities of all other existing splitters for Zope. TXNGSplitter
      is the only splitter that will be used by TextIndexNG. The
      "index numbers"
      options has been removed both from the splitter and the ZMI. In 
addition
      the splitter now accepts an optional set of characters that are
      recognized to be valid inside words. This allows you to index common
      words like "C++" or "python-22.lib" when you specify "+.-" as valid 
word
      characters.

    * Python C extensions compile now under Windows (Binary
      distribution will  be available for Windows)

    * normalizer support added

    * full unicode support

    * the add form for TextIndexNG now uses the registries to obtain
      informations about registered complements instead of hardcoded
      values.

    * fixed problem with changed API of the Interface packages
      (backport from Zope 3 to Zope 2.6)

    * added vocabulary browser

    * lots of code cleanup

    * bug fixes...

    * add statistics tab to ZMI

    * fixed serious bug in TXNGSplitter due to missing
      encoding parameter

    * minor ZMI adjustments

    * using converters no longer raises an exception when a converter
      could not be found for the mime-type of a document

    * using document converters did not work due to a changed API call

    * added Finnish stemmer

    * improved CMF support: TextIndexNG is not able to index foreign file
      format  stored as "Portal File" using the DocumentConverters.
      "Portal File" objects  are indexed if the index name is 
'SearchableText".
      This is a big  improvement since you can now use
      to search through text objects and word,  pdf etc. inside your CMF
      site with the "SearchableText" index.

    * added stopword files for ten languages

    * minor fixes inside the TXNGSplitter

    * changed default encodings from iso-8859-1 to iso-8859-15



Requirements:

    * Zope 2.5 or Zope CVS trunk checkout

Documentation:

    * http://www.zope.org/Members/ajung/TextIndexNG/wiki

Download:

    * http://www.zope.org/Members/ajung/TextIndexNG/ or

    * http://sourceforge.net/project/showfiles.php?group_id=50052