What is modification, and why do we care? (was Re: [Zope3-dev] Missing ObjectContentModifiedEvent)

Fri May 27 10:45:20 EDT 2005

Dieter Maurer wrote:
> Jim Fulton wrote at 2005-5-27 08:29 -0400:
> 
>>...
>>
>>>Then, we probably do something wrong...
>>
>>That's always a possibility.  I think what we are doing is
>>pretty reasonable.  Perhaps you have other suggestions.
> 
> 
> I think we need more control over what modifications trigger
> what reindexing events.
> 
> I am not yet sure about the best (or even a good) approach.
> 
> 
>>>>...
>>>
>>>Even computing the value for a text index (without any change
>>>to the index itself) can be very expensive: it may
>>>include expensive fetching of a large object,
>>>an expensive conversion (text extraction), expensive splitting
>>>and comparison to what is currently indexes.
>>
>>Perhaps. It depends a lot on the application.
>>
>>I suggest that, if this optimization is important, it might
>>be much easier and cleaner to make text extracttion and comparison
>>cheap, rather than, trying to solve the problem with a more complex
>>event model.
> 
> 
> You cannot make text extraction cheap (as it handles potentially large
> data).

You can't make it cheap in all applications.  For most applications,
text extraction and comparison is very cheap.

I'm guessing that you are refering to indexing large (book size)
documents.  I would argue that this is pretty specialized.

> You could make comparison cheap -- e.g. by storing last modification
> dates and comparing them.
> But, I fear, you would just move the problem to when changing the
> modification date.

I think this is a nice solution for those special cases where text
extraction  is expensive.  The nice thing about this solution is that
it involves a contract between the content and the index without
complicating the event framework.

> 
>>...
>>I think it would be very difficult to come up with rules
>>for deciding which events might effect a text value and which would not.
>>For example, I can easily imagine objects who's searchable text
>>depends on their workflow state.
> 
> 
> Indeed, such objects are easily imaginable.
> But usually, it is not the case.

And it is usually not the case that text extraction is expensive.

> The problem is obviously difficult -- not solvable with
> a trivial event model and trivial reindexing dispatching.

Agreed.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org