BTrees strangeness (was [Zope-dev] Zope 2.X BIG Session problems - blocker - our site dies - need help of experience Zope developer, please)

Chris McDonough chrism at plope.com
Wed Mar 3 04:55:21 EST 2004


(boldly crossposting this to zodb-dev, please respond on one list or the
other but not both)

That error *appears* to be caused by reaching a state that is impossible
to reach.  The code in question is:

        for key in list(self._data.keys(None, max_ts)):
            assert(key <= max_ts)
            STRICT and _assert(self._data.has_key(key))
            for v in self._data[key].values():
                to_notify.append(v)
            del self._data[key]

The line that says "for v in self._data[key].values()" is the line that
throws the KeyError.   But it should be impossible for the code to throw
a KeyError for the expression "self._data[key]" because the "keys"
method of the _data IOBTree just told us that the key named by "key" was
one of its keys via the range search; it should be an invariant.  Note
that in the line above that starts "STRICT and _assert...", I do the
paranoid check there as there *have* been cases where BTrees range
searches lied in the past.  STRICT is not true in your case (it's turned
off), so that check never gets run on your system, but if it had, it
might have raised an assertion error.  I haven't been able to provoke
that kind of thing in my own stress tests, unfortunately.

I have been proven to be at fault about this sort of thing before, but
I've been a good boy and I believe I've applied all of the lessons I
learned in the past to the newest code, so I unfortunately again have to
reach the conclusion that there is something afoul in the BTrees code,
provoked only under high stress scenarios.  It's also appears to be very
difficult to reproduce.

In the end, this means to you that... well.. you've got two choices.  a)
continue using ZODB-based sessions and helping us (me) to track it down,
living with the consequences of the errors in the meantime or b) use a
different session implementation.  I would prefer "a" but I do need to
warn you that this might *never* get solved because the failure mode
appears to be so intermittent that it's extremely expensive (in the
dollars-and-cents sense) to pin down and ultimately fix, and that may
prevent me (and ZC) from doing so.  But with a lot of help from other
interested people (like yourself) we may be able to coax the failure out
of obscurity and squish it without breaking the bank. ;-)

Assuming you're interested, what can you do?  Well, you could find out a
little about the BTrees module in Zope CVS, particularly the "check"
module which has code to check a BTree for corruption, and instrument
the Transience code to run the check code in the places it seems to be
coming up with errors before bombing out.  If it's not corrupt, well..
I'm not sure what that means, but it would appear to be a problem with
the BTrees range search functions.  If it is corrupt, it might exonerate
the range search functions.   Rinse, lather, repeat with other checks in
the code, such as reporting the internal state of the BTree when the
error occurs (which I've forgotten how to do, but a maillist search
should help), providing information about when conflict errors were
raised right before the error, and so on.  It's very difficult to
provide a concrete "type this, type that" set of steps for this sort of
thing due to the latency involved in remote debugging an extremely hard
to reproduce failure, so if you want to help best, since you're the
person who has access to the machine where the failure appears to be
reproducible (and hopefully the motive to want to fix it), you should
familiarize yourself with the Transience code and the BTrees APIs and
use a bit of inductive logic to help me isolate the problem.  If you'd
rather not, I can understand that too. ;-)

HTH,

- C



On Wed, 2004-03-03 at 03:18, alex at halogen-dg.com wrote:
> Chris,
> 
> No, just a few minutes ago I got this again:
> 
> Time  	2004/03/03 07:45:04.662 GMT
> User Name (User Id) 	Anonymous User (None)
> Request URL 	http://www.chalkface.com/catalog/html/custom/index_html
> Exception Type 	KeyError
> Exception Value 	1078236460
> 
> Traceback (innermost last):
> 
>     * Module ZPublisher.Publish, line 100, in publish
>     * Module ZPublisher.mapply, line 88, in mapply
>     * Module ZPublisher.Publish, line 40, in call_object
>     * Module OFS.DTMLDocument, line 128, in __call__
>       <DTMLDocument instance at 41c33890>
>       URL: http://www.chalkface.com/custom/index_html/manage_main
>       Physical Path:/www.chalkface.com/ZWarehouse_0.8/custom/index_html
>     * Module DocumentTemplate.DT_String, line 474, in __call__
>     * Module OFS.DTMLDocument, line 121, in __call__
>       <DTMLDocument instance at 41c337a0>
>       URL: http://www.chalkface.com/custom/index.html/manage_main
>       Physical Path:/www.chalkface.com/ZWarehouse_0.8/custom/index.html
>     * Module DocumentTemplate.DT_String, line 474, in __call__
>     * Module DocumentTemplate.DT_Let, line 76, in render
>     * Module OFS.DTMLDocument, line 121, in __call__
>       <DTMLDocument instance at 41c2b080>
>       URL: 
> http://www.chalkface.com/catalog/html/zwarehouse_html_header/manage_main
>       Physical 
> Path:/www.chalkface.com/ZWarehouse_0.8/catalog/html/zwarehouse_html_header
>     * Module DocumentTemplate.DT_String, line 474, in __call__
>     * Module DocumentTemplate.DT_Util, line 201, in eval
>       __traceback_info__: cart_functions
>     * Module <string>, line 1, in <expression>
>     * Module Shared.DC.Scripts.Bindings, line 306, in __call__
>     * Module Shared.DC.Scripts.Bindings, line 343, in _bindAndExec
>     * Module Products.PythonScripts.PythonScript, line 318, in _exec
>     * Module None, line 16, in setSessionByRequest.py
>       <PythonScript at 
> /www.chalkface.com/ZWarehouse_0.8/catalog/cart_functions/setSessionByRequest.py>
>       Line 16
>     * Module ZPublisher.HTTPRequest, line 1218, in __getattr__
>     * Module ZPublisher.HTTPRequest, line 1178, in get
>     * Module Products.Sessions.SessionDataManager, line 93, in 
> getSessionData
>     * Module Products.Sessions.SessionDataManager, line 180, in 
> _getSessionDataObject
>     * Module Products.Transience.Transience, line 491, in new_or_existing
>     * Module Products.Transience.Transience, line 322, in get
>     * Module Products.Transience.Transience, line 198, in _move_item
>     * Module Products.Transience.Transience, line 419, in _gc
> 
> KeyError: 1078236460
> 
> 
> On Wed, 3 Mar 2004, Chris McDonough wrote:
> 
> > Great, I'm going to consider that a resounding endorsement and check it
> > in soon; please do let me know if you see anything odd come up.
> > 
> > If anyone else has been having issues with the old Transience module,
> > and would like to provide feedback on the newer implementation, please
> > get this file:
> > 
> > http://cvs.zope.org/*checkout*/Products/Transience/Transience.py?rev=1.32.12.2.2.2&only_with_tag=chrism-sessiongeddon
> > 
> > ... and temporarily replace Zope's lib/python/Transience/Transience.py
> > with this newer version to help test it out, and report back the results
> > here.
> > 
> > Thanks!
> > 
> > - C
> > 
> > 
> > On Wed, 2004-03-03 at 02:14, alex at halogen-dg.com wrote:
> > > Hi Chris,
> > > 
> > > Until now, we did not got any errors with new Transience.py :) It just 
> > > works, no problems found under high load.
> > > 
> > > Alex
> > > 
> > > On Mon, 1 Mar 2004, Chris McDonough wrote:
> > > 
> > > > > I installed new Transience.py. During my little test it works fine.
> > > > > But real test will be on Monday when students start logging in as complete
> > > > > classes, sometimes there are hundreds of them logging on simultaneously, 
> > > > > so we will see. 
> > > > 
> > > > Any news? ;-)
> > > > 
> > > > 
> > > > 
> > > 
> > > --
> > > Alex V. Koval
> > > http://www.halogen-dg.com/
> > > http://www.zwarehouse.org/
> > > 
> > > 
> > > _______________________________________________
> > > Zope-Dev maillist  -  Zope-Dev at zope.org
> > > http://mail.zope.org/mailman/listinfo/zope-dev
> > > **  No cross posts or HTML encoding!  **
> > > (Related lists - 
> > >  http://mail.zope.org/mailman/listinfo/zope-announce
> > >  http://mail.zope.org/mailman/listinfo/zope )
> > 
> > 
> 
> --
> Alex V. Koval
> http://www.halogen-dg.com/
> http://www.zwarehouse.org/
> 
> 
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev at zope.org
> http://mail.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists - 
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope )




More information about the Zope-Dev mailing list