[ZODB-Dev] Re: BTrees strangeness (was [Zope-dev] Zope 2.X BIG Session problems - blocker - our site dies - need help of experience Zope developer, please)

Thu Mar 4 19:56:05 EST 2004

On Thu, 2004-03-04 at 17:18, Casey Duncan wrote:
> If there is bleed through I think it might be a result of incorrectly
> predicting the conflict behavior of a composite persistent object when
> you traverse it multiple time in different ways. I'm not sure I can
> explain exactly how that would be possible in this case, but it's a
> hunch. Read conflicts don't completely insulate you from dirty reads
> because of the zodb cache. Perhaps the right combination of dirty cache
> state and fresh state within the BTrees internal objects could cause
> this sort of thing.

But there is a cache per connection, right?  And any given connection
tries not to allow return of dirty data out of cache.  That's how a read
conflict happens: the cached copies of all objects associated with a
transaction get invalidated in all connection caches when that
transaction commits.  When another transaction (associated with a
different connection) which was started before the first transaction
committed tries to read one of the objects invalidated via the first
transaction commit, a ReadConflictError is raised.

In any case, it really hasn't proven a smart move in the past to try to
second-guess the ZODB or BTrees, so if my mental model of how this thing
is supposed to work is flawed, and I do need to special-case
simultaneous key deletion, I will do so only after I can scrounge up
enough information to correct my assumptions.  That may be never, but
that's ok.  Yeah, the code is broken now, but it's always been broken. 
so there's no hurry really. 

> I *really* wanted to try to find a way to get rid of the top-level
> mapping. This seemed like a possibility.

I did get rid of one of the persistent mappings I was using
traditionally in the newest implementation.  The mapping I ditched was
an index from session id to timeslice.  It was used in order to, given a
session id, quickly find the timeslice in which the session data was
stored.  Now, instead of using that index, I just march through all the
buckets that are "current" (based on the current time slice) for the
time of access, looking for the session id in all of the buckets.

I also did away with a huge amount of voodoo and magic used to plaster
over symptoms caused by old BTrees bugs and removed several dubious
optimizations.  As a result, the implementation is slower but much
simpler.  I think it's even understandable now, it doesn't attempt to
fight the framework nearly as much as the older implementation did.

There is only one top-level mapping now, which is the mapping from
timeslice to bucket (self._data).  If we can get rid of this by coming
up with a simpler implementation which continues to honor all (or, at
worst, most) of the promises implied by the older implementations,
that'd be great.

> I was thinking it could be an option in a normal pack to not remove
> objects that were modified within the pack window even if they aren't
> reachable. What I hadn't considered though are persistent subobjects.
> Dealing with those would make it more complex in the general pack.

Right.  It's not really a pack operation, although like a pack it may
need to operate on all the objects in the database.

> > Other issues:  When would the gc code be invoked?  Is it safe to
> > invoke the gc code from app code?  
> 
> Pack can be invoked from app code, I think it just forceably prevents
> concurrent runs using a lock.

We'd need to emulate that behavior then I suppose.

> > This is always a bitch.  How do we prevent a
> > *real* pack from hosing our sessions?
> 
> When would a *real* pack happen? Aren't packs specific to a storage? IOW
> packaging the main storage doesn't pack mounted storages AFAIK.

Well, the issue is that we can't just let the session database grow and
grow and grow if it's in RAM.  Unreferenced objects need to get thrown
away, or sooner or later the Zope process will run out of RAM. 
TemporaryStorage is a "packless" storage: it does limited packing
in-band after every commit.  The inband packing it does doesn't remove
unreferenced objects involved in a mutual cycle, however, whereas mark
and sweep does.  I think this is fine in practice; I have not had any
complaints about unbounded memory usage while a TemporaryStorage is
being used and I don't think anybody ever attempts to use the ZMI to
pack a TemporaryStorage (although it is possible to do so). 

We could create a SessionStorage to do an inband type-specific gc like
TemporaryStorage does inband packing.  But then it really boils down to
a functionality and documentation issue.  If you store session data
objects somewhere other than in this SessionStorage, sessioning will
just stop working altogether as no session would ever expire without the
gc code being invoked under the hood (until a pack, then they'd all just
go away regardless of when they were last accessed).  I'm not saying
this is completely unreasonable, but both the SessionStorage
implementation and the creation of documentation required to allow
people not to shoot themselves in the foot seems a bit... lumpy..  at
least in comparison to potentially finding and fixing what might be a
small bug in TemporaryStorage or BTrees.

So I guess what that means is that I'm going to continue to try to pin
down the bug shown by Alex's symptom until I've exhausted my patience. 
If I fail and you're still keen on making a new kind of
storage-cum-transience implementation, maybe you and me can create it
then (unless of course you've already done it! ;-)

- C