[ZODB-Dev] Re: self.length._p_deactivate() and MVCC

Jim Fulton jim at zope.com
Fri Apr 30 10:05:38 EDT 2004


Casey Duncan wrote:
> On Fri, 30 Apr 2004 09:21:06 -0400
> Jim Fulton <jim at zope.com> wrote:
> 
> 
>>Casey Duncan wrote:
>>
>>...
>>
>>
>>>I wrote the comment based on speculative semantics for MVCC. It
>>>looks like it will still work as intended with MVCC as it is now
>>>implemented, so I will remove the comment.
>>
>>OK, I'll bite.  What is the point of this code?  Is it meant to be
>>an optimization?  Length write conflicts can always resolved.
> 
> 
> We discussed this about a year ago. It tries to reduce write conflict
> errors when assigning new wids. The length of the Lexicon is used to
> find the first candidate wid. Wids are assigned in ascending order to
> allow the document word lists to be compressed better, I think it
> assumes popular words will tend to get lower wids. The word lists are
> used for unindexing and phrase matching. 
> 
> In order to pick the next wid, it deactivates the length and then reads
> it again. It increments it until it finds an unused wid in the wid=>word
> btree. The idea is that if another concurrent transaction was indexing
> and adding words at the same time and commited before this point, we
> could read its last wid value and carry on from there, rather than
> picking the same starting wid that it used and getting a write conflict
> (in the btree).
> 
> It was effective in eliminating write conflicts in some tests I wrote,
> so I included it. Its practical value is probably lessened by the fact
> that once a large enough corpus of documents is indexed, few words are
> added to the lexicon as new ones are indexed, at least in common usage.

OK. I didn't realize when this thread started that "wids" were word ids.

I still wonder how effective this is in practice.  If wouldn't expect
new words to be frequent in a mature corpus.  If there are a lot of
conflicts, this technique won't prevent all of them, but, I can see that
it could reduce them.

It's a shame we need to employ such tricks here, but scalability often makes us
do things like this.  A fuller comment describing what's going on is in order.
(These are hard to write when you first implement something like this, because,
at that time, you aren't objective.)

Jim


-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



More information about the ZODB-Dev mailing list