[Zope] Catalog search problem

sean.upton@uniontrib.com sean.upton@uniontrib.com
Thu, 06 Sep 2001 16:07:30 -0700


Paul,
One suggestion that I might make is to consider rewriting your queries
dynamically to be more to your liking before querying them in ZCatalog.  As
an example, the following python class method (below) is a utility method
that I use to rewrite queries using the re regex library.  This is called
via <dtml-call> to rewrite the query in REQUEST right before <dtml-in
Catalog> is called...  You could very likely do something similar for
writing in "default" boolean operators or quotes like Dieter suggests,
supposing you wanted to make such behavior an optional default.  Of course,
for some setups this wouldn't be very good default behavior, so that
decision is left up to you as an application design choice...

Sean

	def queryExtender(self, query):
	        """
	        Takes, as input, query for Text index of ZCatalog, and
	        makes it more intelligent by parsing it and rewriting it
	        to include wildcards at the end of words so that we can
	        search sub-words; in other words, a search for something
	        like "engineer" should yield results for "engineer*" so
	        that terms like "engineers" and "engineering" also are
	        considered matches.

	        Obviously, we have to be careful not to incorrectly
	        parse the query, and we don't want to mess with words
	        that already have wildcards at the end, because you
	        don't want to end up with something like "engineer**"
	
	        """
	
	        ### Define Character Patterns to Strip Out and Split Upon
	        everythingButSearchTerms = '[^A-Za-z0-9*]+' #Regex Pattern


	        ### Create the word list
	        result = re.split(everythingButSearchTerms, query)     
	        
	        ### Get rid of empty string elements in the word list
	        try:
	            for i in range(result.count('')):
	                result.remove('')
	        except:
	            pass
	
	        ### Get rid of boolean operators
	        booleanops =
'^([Aa][Nn][Dd])|([Oo][Rr])|([Aa][Nn][Dd][Nn][Oo][Tt])|([Nn][Ee][Aa][Rr])$'
	
	        i=0 #count variable, used for indexing
	        for item in result:
	                if re.search(booleanops, item):
	                        result.pop(i)
	                i = i + 1

	        ### Now, result is a list of just the words that are 
	        ### meaningful to the search, but we need to eliminate
	        ### any entries that have wildcards in them, because 
	        ### they are likely more specific than our rewrite here
	        asteriskinterm = '(^[*])|([*]$)$' 
	                         #asterisk at start or end of term

	        i=0 #count variable, used for indexing
	        for item in result:
	                if re.search(asteriskinterm, item):
	                        result.pop(i)
	                i = i + 1

 	        ### Now, the list of words in the query we need to modify is
 	        ### final, so we can start modifying the queries, one word
 	        ### at a time...
 	        for item in result:
                       #query = re.sub(item, '*'+item+'*', query, count=1)
                       if (len(item) > 3):
		             query = re.sub(item, item+'*', query, count=1)
		           else:
			       if (len(item) != 1):
		                query = re.sub(item, item+'?', query,
count=1)	
	        return query


-----Original Message-----
From: Dieter Maurer [mailto:dieter@handshake.de]
Sent: Thursday, September 06, 2001 3:04 PM
To: paul dunbar
Cc: zope@zope.org
Subject: Re: [Zope] Catalog search problem


paul dunbar writes:
 > I have a problem when searching my catalog.It has an index called
"Author" which holds the 
 > name of a person who wrote a document.when i search the catalog for say
"paul dunbar",i will 
 > get documents from authors like "paul" or "paul test" as well as paul
dunbar.what i want to do 
 > is limit the matches to "paul dunbar"....
Missing operators between search times are "replaced" by
the default operator ("or"; in Zope 2.4, you can define
"and" as default operator).

To get almost what you want, enclose "paul dunbar" in quotes.
This will make a phrase search, quite near to a search
for "paul dunbar"...



Dieter

_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )