[Grok-dev] Re: Sorting catalog results

Kevin Smith kevin at mcweekly.com
Wed Jun 27 12:04:39 EDT 2007


Hi Luciano,

We are rebuilding a newspaper site built on Plone that is having scaling 
issues with the full-text index right now. We're too small to manage the 
problem by always throwing hardware at it.

As a scaling strategy for the new site, we are using divide and conquer. 
Since it makes sense for us to sort from newest to oldest, we are using 
one catalog for each year.  The archives are from 1998, we publish 20 
articles between 1-3k words plus images  per week, on a commodity 
machine, and have only started experiencing scaling issues in the last 
year.

So Martin's getting-things-done-strategy will most likely work unless 
you know you're going to have a very large database.

Assuming it's necessary, if you are going to need many different kinds 
of sorts you may need to come up with more ingenious ways of dividing 
the content.

Other things we are doing to help mitigate scaling issues.
* pgstorage to avoid the excessive memory used by FileStorage and to 
flatten startup time (directorystorage also has similar benefits)
* OpenVZ virtual servers to throttle various usages of the site 
(seperate text-index searching from browsing from search engine crawls)

HTH,

Kevin Smith

Martijn Faassen wrote:
> Luciano Ramalho wrote:
>> Trying to figure out how to sort results from a catalog search, I just
>> read (most of) a very long thread on Zope3-dev earlier this year and
>> got worried about how it ended.
>>
>> Does it mean that I have to educate my users ("You don't really want
>> sorted results, on account of scalability problems"), or is there some
>> recommended way to sort the results of a catalog search in Zope3 or
>> Grok?
>
> No, please don't educate your users. The debate went back and forth. I 
> think we all agree that the sorting story could be scaled better, but 
> Jim kept pointing out there is no fundamental way to speed it up, and 
> I kept pointing out that besides the fundamentals there may be many 
> things we can do to make this scale better nonetheless. We got stuck 
> in a loop there. :)
>
> There are two strategies here. One is the short-term 
> getting-things-done-strategy. For that, I'd recommend using Python's 
> (or zc.table's, if you're using that for tabular display) sorting 
> functionality. That sorts the whole result set. It may scale well 
> enough for your application.
>
> Now on to the other strategy. Ignas has done some work on the 
> SchoolTool project concerning scalable sorting and batching that may 
> be relevant here and reported to me that he managed to speed things 
> quite a lot. I don't know the details, but here are pointers to the 
> code he gave me a while ago:
>
> http://source.schooltool.org/trac/browser/trunk/schooltool/src/schooltool/skin/table.py 
>
> - FilterWidget and TableFormatter classes
>
> http://source.schooltool.org/trac/browser/trunk/schooltool/src/schooltool/skin/table.py 
>
> - TableContainerView class
>
> http://source.schooltool.org/trac/browser/trunk/schooltool/src/schooltool/skin/templates/table_container.pt 
>
>
> But please talk to Ignas (on irc, or ignas.mikalajunas at gmail.com) for 
> more information.
>
> If this code is interesting, we have a problem, as Schooltool code is 
> GPL. Generalizing it and putting it in Zope's svn is thus blocked. We 
> could go two routes:
>
> * contact Mark Shuttleworth as the Zope Foundation and ask whether we 
> can get this code as ZPL in the Zope repository. I can start this 
> process if needed - let me know.
>
> * talk to Ignas to get the general idea, study the code, and 
> reimplement the concepts as a Zope 3 package without copying the code.
>
> Regards,
>
> Martijn
>
> _______________________________________________
> Grok-dev mailing list
> Grok-dev at zope.org
> http://mail.zope.org/mailman/listinfo/grok-dev
>


More information about the Grok-dev mailing list