[ZWeb] ZCatalog Issues

Jim Fulton jim at zope.com
Tue Jul 13 18:51:18 EDT 2004


Shane Hathaway wrote:
> On Tuesday 13 July 2004 06:39 am, Jim Fulton wrote:
> 
>>Shane Hathaway wrote:
>> > Flushing a cache containing 20,000 objects can take minutes,
>>
>>Huh? This makes no sense.  Flushing objects just frees their
>>state.  This should not take minutes.  If this is reproduceable,
>>we ought to do some profiling to figure out what the heck is
>>going on.
> 
> 
> It's just an observation.  I postulate it happens because ZODB frees the 
> objects in layers: it peels away all the unreferenced objects, revealing more 
> objects that are now unreferenced, and iterates along those lines.  If for 
> some reason it peels off only one object per pass, the total operation is 
> O(n^2 / 2).

I assume that by flushing the cache, you mean calling minimize on eac of the caches.
The minimuze function will make a pass through the cache deactivating
all of the objects.  As that happens, objects become unreferenced.
When an object becomes unreferenced, it makes a weakref-style callback
into the cache which caues the cache to remove it.  All of this should
be pretty fast.

You may have bigger fish to fry, but I'd really like to
see Python profile, and maybe a C profile output for this, if it is
reproduceable.  There is definately something wrong here.  There's always
a chance that it could offer some insight into othe woes.

...

>>Some things I'd look for:
>>
>>- sorting
>>
>>   If we are doing lots of sorted searches, that could cause lots of
>>   meta-data to be loaded.  I suspect that sorting on application
>>attribtes, such as modification time, is the most common case of catalog
>>abuse.
> 
> 
> Yet for usability, we virtually always want to sort.

We should probably explore this further.  Note that sorting by relevence
rank isn't so bad. I was refering to sorting on data fields.  Sorting
search results on data fields is hard or impossible to do scalably.  I can
imagine easily creating a scalability trap.



> 
>>- Too much meta data.
> 
> 
> Agreed.  Unfortunately, it's hard to tell which metadata fields zope.org 
> actually needs.

Yup.

>>- Maybe too many indexes
>>
>>   I think a common problem in Zpe sites is that they have a single catalog
>>   that is used for a wide variety of independent searches.  I think that
>>it would be more efficient in many cases to keep separate catalogs geared
>>toward separate kids of searches.
> 
> 
> That's an interesting idea.  I wonder if we could apply it here.

It's like meta data.  To apply it, you need to understand how the system
is using the catalog.

Jim
-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


More information about the Zope-web mailing list