[ZODB-Dev] Re: [Zope3-dev] Re: Community opinion about search+filter

Mon Mar 26 09:24:41 EDT 2007

On Mar 25, 2007, at 5:27 PM, Martijn Faassen wrote:

> Hey Jim,
>
> Jim Fulton wrote:
>> On Mar 25, 2007, at 12:33 PM, Martijn Faassen wrote:
> [snip]
>>> I have the strong suspicion that modern relational databases are  
>>> currently better able to scale at queries using LIMIT and ORDER BY
>>>  than the Zope 3 catalog.
>> I had a similar suspicion.  I assigned the Python Labs team the task
>> of finding out through literature search the approaches used.  They
>> found that there were none other than the sorts of things I've
>> mentioned.
>
> What about caching strategies? (as I sketched out in my last mail)

Obviously, it depends a lot on access patterns.  I expect that this  
is an area where picking the right strategy and suceeding is highly  
application specific.

Take batching.  Caching would potentially make getting multiple  
batching go faster,. but to benefit, you'd have to increase the  
internal batch size.  For example, if the user visible batch size is  
20 and you wanted them to be able to get the second batch without  
searching and sorting, you'd have to make your internal batch size  
40.  That would increase the cost for the first batch by on the order  
of log(2).  I suspect that most people don't look at multiple  
batches, so caching to support multiple batches could be a  
significant loss, even leaving memory impact aside.

OTOH, we've used some highly application specific caching strategies  
in some of our commercial applications to great success. These caches  
were implemented as specialized indexes, and I would argue that  
indexes are really a form of caching.

> This article about MySQL claims that MySQL is the only database  
> that does query result set caching. Surprising for such an obvious  
> thought:

Sounds like BS to me. :)

>
> http://dev.mysql.com/tech-resources/articles/mysql-query-cache.html
>
> Perhaps it doesn't work as well as one would think and that's why  
> other database engines rejected it. :)

I suspect it is a hard general strategy to get right.

Note that SQL methods support query caching and Zope's caching  
framework is often used to cache various kinds of computations,  
including searches.

>>> I cannot back this up as I haven't done measurements. Perhaps you
>>> have done so?
>> We did a literature search.
>
> That's useful, but doesn't tell us very much about how they compare in
> practice.

Actually, it does.  But feel free to to dome performance tests.

> Perhaps someone should do measurements and see how the two compare  
> in a
> sort/batch use case. It shouldn't be too hard to set up a relational
> database-based sorted batch along with a ZODB/catalog based sorted  
> batch
> and see how they both hold up.

Yup, although, to be meaningful, you need to look at large data  
sets.  This raises the amount of effort required.

>
>>> * Do you estimate the performance of the Zope 3 catalog to be  
>>> equivalent to the performance of a modern relational database
>>> system for queries that need to sort and batch their results?
>> I estimate that the same issues apply to both.
>
> Theoretical algorithm scalability is one thing, and the same issues
> apply to both. Practical scalability might vary widely.

OK, I give up.  This argument just isn't worth my time any more.  I'm  
sorry I objected to the original point.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org