[Zope3-Users] Search multiple fields with TextIndex

Christian Lück christian.lueck at ruhr-uni-bochum.de
Thu Feb 19 17:34:11 EST 2009


Massimiliano della Rovere wrote:
> On Tue, Feb 17, 2009 at 23:50, Christian Lück
> <christian.lueck at ruhr-uni-bochum.de> wrote:
>> Massimiliano della Rovere wrote:
>>> Another question is:
>>> is there a way to automatically create TextIndices for certain (all) fields
>>> of an interface?
>> No, I think there's not. The event subscriber to IEventoNuovoSiteMabon
>> is the way to go.
>> Instead of using a field manager :) I suggest you use the api of
>> zope.interface and zope.schema to iterate over the fields and drop some.
>> See p.64 of Philipp's book (third edition).
> I use the Field manager because I find the omit function more elegant
> than an "if" in a iterator:
> campi = [ campo for campo in zope.schema.getFieldNames(IScheda) if
> campo not in ('__name__', '__parent__', 'titolo') ]
> 

cons: difficult to read for other programmers; dependency on form
framework in a non-form component

>> form zope.app.catalog.text import TextIndex
>>
>> def catalogo_e_indice(event):
>>  gs = event.object.getSiteManager()
>>
>>  ... # create intid utility and catalog as you did
>>
>>  for campo in zope.schema.getFieldNames(IScheda):
>>    if not campo in ('__name__', '__parent__', 'some_field_I_want_to_drop'):
>>      catalogo[campo] = TextIndex(
>>        interface = IScheda,
>>        field_name = campo,
>>        field_callable = False)
> there is something strange in my opinion using this method.
> When I specify the interface=IScheda, I do not link the data extractor
> to the actual istance of the Scheda class.
> On the contrary using an adapter, like TestoSchedaPerRicerca (see
> below), the istance which to extract the data from is passed as
> context.
> 

When an object is indexed and the interface keyword (IScheda in your
case) was given for the index, then the index tries to access
IScheda(object).some_field. If the object already provides IScheda, then
IScheda(object) is the object itself (please don't take 'is ... itself'
as a technical term). If object does not provide IScheda, then the
statement IScheda(object) means that an adapter that provides IScheda
and adapts to an Interface provided by the object is looked up. If the
adapter lookup fails the object is not indexed at all. There is nothing
strange with this--it's the power of zope adapters.
See the docs for zope.interface and zope.component for this.


> When I use this method I receive the following error:
> Traceback (most recent call last):
>   File "/usr/lib/python2.4/site-packages/zope/publisher/publish.py",
> line 133, in publish
>     result = publication.callObject(request, obj)
>   File "/usr/lib/python2.4/site-packages/zope/app/publication/zopepublication.py",
> line 161, in callObject
>     return mapply(ob, request.getPositionalArguments(), request)
>   File "/usr/lib/python2.4/site-packages/zope/publisher/publish.py",
> line 108, in mapply
>     return debug_call(obj, args)
>    - __traceback_info__: <security proxied
> zope.app.publisher.browser.viewmeta.ModuloAggiungiScheda instance at
> 0xa89c4cc>
>   File "/usr/lib/python2.4/site-packages/zope/publisher/publish.py",
> line 114, in debug_call
>     return obj(*args)
>   File "/usr/lib/python2.4/site-packages/zope/formlib/form.py", line
> 769, in __call__
>     self.update()
>   File "/usr/lib/python2.4/site-packages/zope/formlib/form.py", line
> 750, in update
>     result = action.success(data)
>   File "/usr/lib/python2.4/site-packages/zope/formlib/form.py", line
> 594, in success
>     return self.success_handler(self.form, self, data)
>   File "/usr/lib/python2.4/site-packages/zope/formlib/form.py", line
> 861, in handle_add
>     self.createAndAdd(data)
>   File "/usr/lib/python2.4/site-packages/zope/formlib/form.py", line
> 868, in createAndAdd
>     return self.add(ob)
>   File "/usr/lib/python2.4/site-packages/zope/formlib/form.py", line
> 877, in add
>     ob = self.context.add(object)
>   File "/usr/lib/python2.4/site-packages/zope/app/container/browser/adding.py",
> line 72, in add
>     container[name] = content
>   File "/usr/lib/python2.4/site-packages/zope/app/container/sample.py",
> line 86, in __setitem__
>     setitem(self, self.__data.__setitem__, key, object)
>   File "/usr/lib/python2.4/site-packages/zope/app/container/contained.py",
> line 593, in setitem
>     notify(event)
>   File "/usr/lib/python2.4/site-packages/zope/event/__init__.py", line
> 23, in notify
>     subscriber(event)
>   File "/usr/lib/python2.4/site-packages/zope/component/event.py",
> line 26, in dispatch
>     for ignored in zope.component.subscribers(event, None):
>   File "/usr/lib/python2.4/site-packages/zope/component/_api.py", line
> 130, in subscribers
>     return sitemanager.subscribers(objects, interface)
>   File "/usr/lib/python2.4/site-packages/zope/component/registry.py",
> line 290, in subscribers
>     return self.adapters.subscribers(objects, provided)
>   File "/usr/lib/python2.4/site-packages/zope/interface/adapter.py",
> line 535, in subscribers
>     subscription(*objects)
>   File "/usr/lib/python2.4/site-packages/zope/component/event.py",
> line 33, in objectEventNotify
>     adapters = zope.component.subscribers((event.object, event), None)
>   File "/usr/lib/python2.4/site-packages/zope/component/_api.py", line
> 130, in subscribers
>     return sitemanager.subscribers(objects, interface)
>   File "/usr/lib/python2.4/site-packages/zope/component/registry.py",
> line 290, in subscribers
>     return self.adapters.subscribers(objects, provided)
>   File "/usr/lib/python2.4/site-packages/zope/interface/adapter.py",
> line 535, in subscribers
>     subscription(*objects)
>   File "/usr/lib/python2.4/site-packages/zope/app/intid/__init__.py",
> line 169, in addIntIdSubscriber
>     notify(IntIdAddedEvent(ob, event))
>   File "/usr/lib/python2.4/site-packages/zope/event/__init__.py", line
> 23, in notify
>     subscriber(event)
>   File "/usr/lib/python2.4/site-packages/zope/component/event.py",
> line 26, in dispatch
>     for ignored in zope.component.subscribers(event, None):
>   File "/usr/lib/python2.4/site-packages/zope/component/_api.py", line
> 130, in subscribers
>     return sitemanager.subscribers(objects, interface)
>   File "/usr/lib/python2.4/site-packages/zope/component/registry.py",
> line 290, in subscribers
>     return self.adapters.subscribers(objects, provided)
>   File "/usr/lib/python2.4/site-packages/zope/interface/adapter.py",
> line 535, in subscribers
>     subscription(*objects)
>   File "/usr/lib/python2.4/site-packages/zope/app/catalog/catalog.py",
> line 153, in indexDocSubscriber
>     cat.index_doc(id, ob)
>   File "/usr/lib/python2.4/site-packages/zope/app/catalog/catalog.py",
> line 62, in index_doc
>     index.index_doc(docid, texts)
>   File "/usr/lib/python2.4/site-packages/zope/app/catalog/attribute.py",
> line 144, in index_doc
>     return super(AttributeIndex, self).index_doc(docid, value)
>   File "/usr/lib/python2.4/site-packages/zope/index/text/textindex.py",
> line 45, in index_doc
>     self.index.index_doc(docid, text)
>   File "/usr/lib/python2.4/site-packages/zope/index/text/okapiindex.py",
> line 225, in index_doc
>     count = BaseIndex.index_doc(self, docid, text)
>   File "/usr/lib/python2.4/site-packages/zope/index/text/baseindex.py",
> line 95, in index_doc
>     wids = self._lexicon.sourceToWordIds(text)
>   File "/usr/lib/python2.4/site-packages/zope/index/text/lexicon.py",
> line 66, in sourceToWordIds
>     for t in last:
> TypeError: iteration over non-sequence
> 
> 

I can only guess. Are the indexed fields of the new object iterable? If
there's a zope.schema.Int field for example, it is not iterable!

I've played around on the python prompt and got exactly the same
traceback. Note, that 'year' is an Integer and integers don't go into a
TextIndex!

    >>> import zope.interface
    >>> import zope.schema
    >>> class IPublication(zope.interface.Interface):
    ...     year = zope.schema.Int(title = u"Year")
    ...     title = zope.schema.TextLine(title = u"Title")

    >>> from zope.schema.fieldproperty import FieldProperty
    >>> class Publication(object):
    ...     zope.interface.implements(IPublication)
    ...     year = FieldProperty(IPublication['year'])
    ...     title = FieldProperty(IPublication['title'])

    >>> test = Publication()
    >>> test.year = 1950
    >>> test.title = u"Computing Machinery and Intelligence"
    >>>
    >>> from zope.app.catalog.catalog import Catalog
    >>> from zope.app.catalog.text import TextIndex
    >>> cat = Catalog()
    >>> cat['year'] = TextIndex(
    ...     interface = IPublication,
    ...     field_name = 'year',
    ...     )

    >>> cat['title'] = TextIndex(
    ...     interface = IPublication,
    ...     field_name = 'title',
    ...     )

    >>> cat.index_doc(1, test)
    Traceback (most recent call last):
      File
"/home/clueck/.buildout-eggs/zope.testing-3.7.1-py2.4.egg/zope/testing/doctest.py",
line 1356, in __run
        compileflags, 1) in test.globs
      File "<doctest notiterable.txt[13]>", line 1, in ?
        cat.index_doc(1, test)
      File
"/home/clueck/.buildout-eggs/zope.app.catalog-3.6.0-py2.4.egg/zope/app/catalog/catalog.py",
line 73, in index_doc
        index.index_doc(docid, texts)
      File
"/home/clueck/.buildout-eggs/zope.app.catalog-3.6.0-py2.4.egg/zope/app/catalog/attribute.py",
line 144, in index_doc
        return super(AttributeIndex, self).index_doc(docid, value)
      File
"/home/clueck/.buildout-eggs/zope.index-3.5.0-py2.4.egg/zope/index/text/textindex.py",
line 45, in index_doc
        self.index.index_doc(docid, text)
      File
"/home/clueck/.buildout-eggs/zope.index-3.5.0-py2.4.egg/zope/index/text/okapiindex.py",
line 223, in index_doc
        count = BaseIndex.index_doc(self, docid, text)
      File
"/home/clueck/.buildout-eggs/zope.index-3.5.0-py2.4.egg/zope/index/text/baseindex.py",
line 97, in index_doc
        wids = self._lexicon.sourceToWordIds(text)
      File
"/home/clueck/.buildout-eggs/zope.index-3.5.0-py2.4.egg/zope/index/text/lexicon.py",
line 66, in sourceToWordIds
        for t in last:
    TypeError: iteration over non-sequence


Suggestion: Use a ValueIndex instead.

hope, that helps.

> 
>>>> and the data extractor:
>>>> class TestoSchedaPerRicerca(object):
>>>> """Estrattore del testo per le ricerche"""
>>>> implements(ISearchableText)
>>>> adapts(IScheda)
>>>>
>>>> def __init__( self, context ):
>>>> self.context = context
>>>>
>>>> def __getattr__( self, attr ):
> I use the __getattr__ so that when the TestoSchedaPerRicerca is asked
> for an attribute matching the name of those specified in the Index
> definition:
> 	for i in campi:
> 		testuale = TextIndex(
> 			interface = ISearchableText,
> 			field_name = i,
> 			field_callable = True
> 		)
> 		catalogo[i] = testuale
> 
>> # What is this method for? Does ISearchableText subclass IScheda?
> No.

At least, I would say that this is unclean. The interfaces provided by
an object should define its behaviour. But your TestoPerRicerca adapter
exposes attributes that its interface ISearchableText does not define.
In the for-loop for index-creation on the other hand you access
attributes that are not defined in the interface ISearchableText. It may
work, but its hard to read your code, and your implementation is far
away from the definitions in your interfaces. This way you use
interfaces yust as a lookup-feature. It will be hard writing tests and
keeping things stable.

> ISearchableText is an interface created to be used in conjuction with
> Index creation.
> It is in "zope.index.text.interfaces".

Regards,
Christian

PS. Please post cc to the list. Other zope3-users might profit from this
discussion.


More information about the Zope3-users mailing list