[Zope-dev] ZCatalog scalability

Michael Bernstein webmaven@lvcm.com
Tue, 23 Jan 2001 07:51:08 -0800


Erik Enge wrote:
> 
> [Chris Withers]
> 
> | ...and is that specifically for BTree folders, or Zope BTree's in general?
> 
> I don't believe that B-Tree folders have those kinds of limitations by
> general design.  I'm more conserned that somewhere along the lines,
> doing operations on a huge BTree Folder (Yes, in Zope) will be slow.

What sort of 'operations' do you mean? copying and pasting
the whole thing?

> Hm, more over, if you actually need to stuff that many objects into
> one Folder, you are probably trying to use the wrong tool for the job.
> 
> I do expect that stuffing 27 million objects into one BTree Folder
> will be slow, and I don't want to segment the data.  I do expect that
> I'll have to resort to a relational database, and I have no problem
> with that.  Object databases aren't always the right tool for the job,
> and when they aren't, Zope let's me talk with the «other» ones nicely,
> so no problemo señor ;).

Eric,

I had separated the storage issue into a different thread
(Specialist/Rack Scalability), and received a reply from
Phillip Eby:

> Just to expand a little on the abov...  Racks should scale at least as
> well, if not larger than a ZCatalog, given the same storage backing for
> the ZODB.  This is because ZCatalog has to manage a minimum of one
> forward and reverse BTree for *each* index, plus another few BTrees
> for overall storage and housekeeping.  Also, keyword and full text
> indexes store multiple BTree entries per object, so that's a factor as
> well.

So the question I was asking is: "if we ignore the issue of
storage and consider indexing and searching the ZCatalog
alone, and assuming that wildcard searches are disallowed,
how far will a single ZCatalog with a text index (on a
computed attribute that concatenates several properties) and
a keyword index (for creating ZTopic heirarchies) scale?"

While I'm perfectly willing to split up the storage of the
data as neccessary, I am far less enamoured by the prospect
of divvying up the indexing and searching to multiple
ZCatalogs. In any case, according to Phillip, if I don't
have to split the ZCatalog, I shouldn't have to split the
storage (in Racks, anyway, but probably BTree Folders too),
either.

Anyway Eric, I hope that when you report your results,
you're able to separate indexing, searching, storage, and
retreival results, so that the appropriate factor can be
identified as the bottleneck. Or at least into
indexing/searching and storage/retreival.

Thanks,

Michael Bernstein.