[Zope] ZCatalog speed?

Stuart Woolford stuartw@newmail.net
Tue, 14 Sep 1999 10:44:06 +1200


On Tue, 14 Sep 1999, Michel Pelletier wrote:
> Stuart Woolford wrote:
> > 
> 
> > 
> > on my 500MHz P-II with 196MB of memory it takes:
> > 
> > a - 22 minutes to create 8800 documents (smallish) in 1200 folders within zope,
> > not too fast :( but not exactly a user-interaction-limiting factor :)
> 
> 7 documents per second aint too bad I don't think, it would be
> interesting to see how fast you could dump them to the filesystem.

I can produce documents to the FS around 10 times that speed, but I'm not
complaining, I think it is not too bad..

> 
> > b - too long to then try to do a search based add to a zcatalog,
> > ie: netscape times out after only around 8 minutes, and the search has not
> > finished!
> 
> Let me make sure we have the same terminology.  'Finding' objects into
> the catalog involves using the find tab to search recursively down from
> the catalog.  'Searching' means typing search criteria into an allready
> loaded catalog and getting results.  It sounds like your talking about
> 'finding'.  If it's taking 8 minutes to do a *search*, that's a bug.  If
> it's finding your taking about, try increasing the sub transaction
> threshold (on the status screen) by an order of magnitude or two.  This
> will cause Zope to commit sub-transactions less frequently.  1000, the
> default, is probably two low but since this is the first version of Zope
> with a catalog in it, it's not gotten any real world use.  We'll
> probably jack it up to at least 10,000 for 2.1.

you are right, I'm findingdocs into the zcatalogue, not searching it (yet).
> 
> > BTW, Zopes python process and postgresql take about 50% of the CPU each, and
> > there is basically zero disk thrashing during this process (although zope does
> > get up around 50MB of memory use..)
> 
> Yes mass indexing is inneficient at the moment.  I recently recieved
> 'Managing Gigabytes' which was recommended by someone on the list.  It
> has some very cool stuff in it that we might put into the catalog to
> speed up indexing and searching (although as far as I can tell, searches
> with ZCatalog are *damn* fast), and reduce memory and object database
> consumption with slicker aglorithms and compression.  It also has some
> cool stuff about wildcard/globbing searches at the expense of some extra
> memory.

I was thinking that a 50% share was not to bad for a non-native-compiled..
pretty much on target I would say.

> 
> Note that the time it takes to mass index will improve as we improve the
> algorithm, but in reality indexing allways takes time.  Once your
> 'corpus' of documents is created, it would be much, much faster to
> incrementally index new and changed documents into the catalog then to
> mass index everything over again.

One VERY interesting think I have noticed:

around 5 minutes into the add, watching TOP on the unix system, I see that the
python process splits (it's around 11MB at this stage), than a little after I
get another postmaster (the database) process appearing, and from then on we
have a 4-way split of CPU, instead of 2-way, I don't see any reason for Zope to
split off a new process (it has no other connections while doing this) - is
this a bug perhaps?


 > 
> -Michel
--
------------------------------------------------------------
Stuart Woolford, stuartw@newmail.net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------