[ZODB-Dev] Re: BTrees strangeness (was [Zope-dev] Zope 2.X BIGSession problems - blocker - our site dies - need help of experienceZope developer, please)

Thu Mar 4 21:46:35 EST 2004

[Casey Duncan]
>> If there is bleed through I think it might be a result of incorrectly
>> predicting the conflict behavior of a composite persistent object
>> when you traverse it multiple time in different ways.  I'm not sure I
>> can explain exactly how that would be possible in this case, but
>> it's a hunch.  Read conflicts don't completely insulate you from
>> dirty reads because of the zodb cache.  Perhaps the right combination
>> of dirty cache state and fresh state within the BTrees internal
>> objects could cause this sort of thing.

If it can, we'll consider it to be a critical bug -- ZODB should never
deliver an inconsistent view, provided clients don't ignore the exceptions
it raises.

[Chris McDonough]
> But there is a cache per connection, right?  And any given connection
> tries not to allow return of dirty data out of cache.  That's how a
> read conflict happens: the cached copies of all objects associated
> with a transaction get invalidated in all connection caches when that
> transaction commits.  When another transaction (associated with a
> different connection) which was started before the first transaction
> committed tries to read one of the objects invalidated via the first
> transaction commit, a ReadConflictError is raised.

That's the theory.  Note that, not all that long ago, there were glaring
errors in the way invalidations were handled:  the invalidations stemming
from a transaction weren't processed atomically.  That is, the cache might
process an invalidation for one stale object from transaction T, let you run
a while longer, then invalidate another object from transaction T.  It was
quite possible to get inconsistent views then, and the real marvel with that
was how long it took for anyone to notice it was possible!

But in the most-recent ZODB+ZEO releases, invalidations are handled
atomically, and by all pieces involved:  at any given time, it should be
that the cache your code "sees" has processed no invalidations from another
transaction T, or has processed all of T's invalidations.

>
> In any case, it really hasn't proven a smart move in the past to try
> to second-guess the ZODB or BTrees, so if my mental model of how this
> thing is supposed to work is flawed, and I do need to special-case
> simultaneous key deletion, I will do so only after I can scrounge up
> enough information to correct my assumptions.  That may be never, but
> that's ok.  Yeah, the code is broken now, but it's always been broken.
> so there's no hurry really.
>
>> I *really* wanted to try to find a way to get rid of the top-level
>> mapping. This seemed like a possibility.
>
> I did get rid of one of the persistent mappings I was using
> traditionally in the newest implementation.  The mapping I ditched was
> an index from session id to timeslice.  It was used in order to,
> given a session id, quickly find the timeslice in which the session
> data was stored.  Now, instead of using that index, I just march
> through all the buckets that are "current" (based on the current time
> slice) for the time of access, looking for the session id in all of
> the buckets.
>
> I also did away with a huge amount of voodoo and magic used to plaster
> over symptoms caused by old BTrees bugs and removed several dubious
> optimizations.  As a result, the implementation is slower but much
> simpler.  I think it's even understandable now, it doesn't attempt to
> fight the framework nearly as much as the older implementation did.
>
> There is only one top-level mapping now, which is the mapping from
> timeslice to bucket (self._data).  If we can get rid of this by coming
> up with a simpler implementation which continues to honor all (or, at
> worst, most) of the promises implied by the older implementations,
> that'd be great.
>
>> I was thinking it could be an option in a normal pack to not remove
>> objects that were modified within the pack window even if they aren't
>> reachable. What I hadn't considered though are persistent subobjects.
>> Dealing with those would make it more complex in the general pack.
>
> Right.  It's not really a pack operation, although like a pack it may
> need to operate on all the objects in the database.
>
>>> Other issues:  When would the gc code be invoked?  Is it safe to
>>> invoke the gc code from app code?
>>
>> Pack can be invoked from app code, I think it just forceably prevents
>> concurrent runs using a lock.
>
> We'd need to emulate that behavior then I suppose.
>
>>> This is always a bitch.  How do we prevent a
>>> *real* pack from hosing our sessions?
>>
>> When would a *real* pack happen? Aren't packs specific to a storage?
>> IOW packaging the main storage doesn't pack mounted storages AFAIK.
>
> Well, the issue is that we can't just let the session database grow
> and grow and grow if it's in RAM.  Unreferenced objects need to get
> thrown away, or sooner or later the Zope process will run out of RAM.
> TemporaryStorage is a "packless" storage: it does limited packing
> in-band after every commit.  The inband packing it does doesn't remove
> unreferenced objects involved in a mutual cycle, however, whereas mark
> and sweep does.  I think this is fine in practice; I have not had any
> complaints about unbounded memory usage while a TemporaryStorage is
> being used and I don't think anybody ever attempts to use the ZMI to
> pack a TemporaryStorage (although it is possible to do so).
>
> We could create a SessionStorage to do an inband type-specific gc like
> TemporaryStorage does inband packing.  But then it really boils down
> to a functionality and documentation issue.  If you store session data
> objects somewhere other than in this SessionStorage, sessioning will
> just stop working altogether as no session would ever expire without
> the gc code being invoked under the hood (until a pack, then they'd
> all just go away regardless of when they were last accessed).  I'm
> not saying this is completely unreasonable, but both the
> SessionStorage implementation and the creation of documentation
> required to allow people not to shoot themselves in the foot seems a
> bit... lumpy..  at least in comparison to potentially finding and
> fixing what might be a small bug in TemporaryStorage or BTrees.
>
> So I guess what that means is that I'm going to continue to try to pin
> down the bug shown by Alex's symptom until I've exhausted my patience.
> If I fail and you're still keen on making a new kind of
> storage-cum-transience implementation, maybe you and me can create it
> then (unless of course you've already done it! ;-)
>
> - C
>
>
>
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev at zope.org
> http://mail.zope.org/mailman/listinfo/zodb-dev