[ZODB-Dev] Re: self.length._p_deactivate() and MVCC

Casey Duncan casey at zope.com
Fri Apr 30 09:44:30 EDT 2004


On Fri, 30 Apr 2004 09:21:06 -0400
Jim Fulton <jim at zope.com> wrote:

> Casey Duncan wrote:
> 
> ...
> 
> > I wrote the comment based on speculative semantics for MVCC. It
> > looks like it will still work as intended with MVCC as it is now
> > implemented, so I will remove the comment.
> 
> OK, I'll bite.  What is the point of this code?  Is it meant to be
> an optimization?  Length write conflicts can always resolved.

We discussed this about a year ago. It tries to reduce write conflict
errors when assigning new wids. The length of the Lexicon is used to
find the first candidate wid. Wids are assigned in ascending order to
allow the document word lists to be compressed better, I think it
assumes popular words will tend to get lower wids. The word lists are
used for unindexing and phrase matching. 

In order to pick the next wid, it deactivates the length and then reads
it again. It increments it until it finds an unused wid in the wid=>word
btree. The idea is that if another concurrent transaction was indexing
and adding words at the same time and commited before this point, we
could read its last wid value and carry on from there, rather than
picking the same starting wid that it used and getting a write conflict
(in the btree).

It was effective in eliminating write conflicts in some tests I wrote,
so I included it. Its practical value is probably lessened by the fact
that once a large enough corpus of documents is indexed, few words are
added to the lexicon as new ones are indexed, at least in common usage.

-Casey



More information about the ZODB-Dev mailing list