[Zope-dev] Searching/Indexing/ZODB/SQL/BerkleyDB

Matt Hamilton matth@netsight.co.uk
Thu, 29 Nov 2001 11:03:42 +0000 (GMT)


On Thu, 29 Nov 2001, Chris Withers wrote:

> > I would rather avoid having to use a relational database unless I have to.
> > Perhaps the index pluggability could be made to support different backends
> > (like FileStorage et al does).
>
> Yeah, unfortunately, the difficult bit is combining queries:
> gimme the results where index1=='fish' and index2 is between 2 and 5kg.
>
> if index1 is in SQL and index2 is in ZODB, for example, how would you
> go about efficiently combining results?

Is there not a set datatype in python that could be used?  Admittedly,
most of the stuff in MG is about textual searches rather than exact
searches (it can do boolean searches too, but the book is mainly about
ranking).  It uses an algorithm called the 'Cosine Ranking Algorithm'.
Basically if you imagine an N-dimensional space, where N is the number of
terms in your vocabulary and represent a document as a vector in that
space whose direction is the composite of the terms that appear in it.
You then represent a query string as a vector in the same space, the
similarity between the document and the query is the angle between the two
vectors... the smaller the angle the greater the similarity.

Still with me? :)

-Matt

-- 
Matt Hamilton                                         matth@netsight.co.uk
Netsight Internet Solutions, Ltd.          Business Vision on the Internet
http://www.netsight.co.uk                               +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration