[Zope3-dev] Re: The Proposed Catalog Sprint Agenda is available

Matt Hamilton matth@netsight.co.uk
Fri, 15 Feb 2002 13:49:54 +0000 (GMT)


On Fri, 15 Feb 2002, Jim Fulton wrote:

> > The proposal says nothing about when indexing occurs,
>
> Yup. This is an important detail that we should work on hashing out
> next week. :)

I am fine with the proposal as well.  I don't know anything about the
Component Architecture though, so after the tutorial it shold make more
sense to me.

When the indexing occurs is quite a critical point.  I like the idea of
queuing up the requests and then actioning them when the commit occurs.
Processing batches of entries is much more efficient than lots of single
entries when dealing with indexing procedures.

Depending on the size of the batches and the location of the indexes (ZODB
or external) it might be quite good to look the the system Lucene uses.
One of the complicated/expensive tasks of inverted indexes stored on disk
files is expanding them (since you need to write into the middle of the
file).  If the indexes are stored in the ZODB then this might not be so
much of an issue, since it would all be abstracted out to the programmer,
however it may still prove very inefficient as the ZODB might have to
continually grow the storage the object is in (I'm not sure how the
storage internals actually work in the ZODB).  Lucene uses a different
approach in that it *never* uxpands an index file on disk, it simply
creates a new index file for the next batch of updates.  It can read all
of the seperate index files in parallel, and has a procedure to merge
multiple index files into one as time goes on, to keep the files down to a
minimum.  Merging indexes is quite easy.

As I said above this procedure may be useful still if we store the indexes
in the ZODB as it might reduce the amount of small updates to the indexes.

I haven't had a chance to look at specifics of the compression used in
Lucene, but it is a difference system to that proposed in MG, and at a
quick glance, not as compact (but also not as complicated!).

-Matt


-- 
Matt Hamilton                                         matth@netsight.co.uk
Netsight Internet Solutions, Ltd.          Business Vision on the Internet
http://www.netsight.co.uk                               +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration