[ZODB-Dev] Re: BTrees strangeness (was [Zope-dev] Zope 2.X BIG Session problems - blocker - our site dies - need help of experience Zope developer, please)

Casey Duncan casey at zope.com
Thu Mar 4 17:18:11 EST 2004


On Thu, 04 Mar 2004 16:49:39 -0500
Chris McDonough <chrism at plope.com> wrote:

> On Thu, 2004-03-04 at 09:50, Casey Duncan wrote:
[..]
> If there is a special case for simultaneous deletions of BTree key
> which results in bleed-through of database state between connections
> (causing at least one thread to do a "dirty read", which would indeed
> explain the failure case), that's fine, and I will be happy to work
> around it, I just want to understand it and get it documented (at
> least in my own head) before slapping an exception case in there.
> 
> But personally, I'm hoping that the symptom is somehow the fault of
> the TemporaryStorage and that I can remain blissfully unaware of this
> special case when writing future code.

If there is bleed through I think it might be a result of incorrectly
predicting the conflict behavior of a composite persistent object when
you traverse it multiple time in different ways. I'm not sure I can
explain exactly how that would be possible in this case, but it's a
hunch. Read conflicts don't completely insulate you from dirty reads
because of the zodb cache. Perhaps the right combination of dirty cache
state and fresh state within the BTrees internal objects could cause
this sort of thing.
 
> > The other day I was thinking about other approaches to this gc
> > problem and a possible solution based on recent changes to the ZODB
> > ocurred to me:
> > 
> > You are basically reimplementing a kind of pack operation here,
> > except it is an application pack not utilizing the underlying ZODB
> > pack mechanism. But what if you could use that instead? An add()
> > methods was recently added to the ZODB connection interface which
> > allows you to add unreferenced objects to the database. What if
> > transient objects were just unreferenced persistent objects stored
> > in the database? Their "key" would simple be their oid. So a session
> > key could be the oid of the transient session object. Whenever a
> > session was accessed it would be marked changed in the database and
> > its mtime would be updated.
> 
> That is a *really* cool idea.

I *really* wanted to try to find a way to get rid of the top-level
mapping. This seemed like a possibility.
 
> > Periodically the transient storage would be packed to the desired
> > timeout value, cleaning out any transient objects that had not been
> > accessed recently enough to keep around. This pack could be done
> > "in-band" because AFAIK the storage can prevent concurrent packing.
> > It could also be done out-of-band. 
> 
> Right now, most (all?) ZODB pack implementations use mark and sweep,
> starting at the root object, recursively unpickling each object
> reachable from theand finding out other objects that are referenced by
> that object.  All objects that aren't reachable are expunged.
> 
> That goes something like this (bad pseudocode):

;^)
 
> def pack(self):
>    stack = [ROOT_OB_OID]
>    reachable = []
> 
>    # mark
>    while stack:
>       oid = stack.pop()
>       pickle = self._pickles[oid]
>       referenced_oids = FIND_REFERENCES_FROM(pickle)
>       reachable.extend(referenced_oids)
>       stack.extend(referenced_oids)
> 
>    # sweep
>    for oid in self._pickles.keys():
>       if not oid in reachable:
>          del self._pickles[oid]
> 
> This isn't what we'd want to do in the sessioning case because it
> doesn't take into account mod time, just plain referenceability from
> other objects.  Instead, we'd want to do something like:

right.
 
> def gc(self, cutoff_time, object_type=TransientObject):
>    for oid in self._pickles.keys()
>       pickle = self._pickles[oid]
>       if not IS_A_PICKLE_OF_THIS_KIND_OF_OBJECT(object_type, pickle):
>           continue
>       modtime = GET_MOD_TIME_OF(pickle)
>       if modtime < cutoff_time:
>          del self._pickles[oid]
> 
> This leaves all of the subobjects of the transient object in the
> storage; they'd need to be removed by a normal pack.

Yup, that's good.
       
> > What I'm unsure about is whether pack would keep recent revisions to
> > unreferenced objects,
> 
> It won't.  A pack will now destroy anything unreferenced.
> 
> > I'm thinking it wouldn't. Perhaps the transient
> > storage could implement pack slightly differently so that it kept
> > even unreferenced objects that were modified recently. In fact this
> > might be useful as an optional feature for all storages, I dunno.
> 
> Right.  I wonder if we could sneak that "gc" code ("remove all objects
> starting at this root that have a lesser bobobase_mod_time than x and
> that are of object type y") into stock storage code.  I suspect it
> wouldn't be very popular.  Maybe a more generalized callback mechanism
> could be created that "plugged in" to the packing API.

I was thinking it could be an option in a normal pack to not remove
objects that were modified within the pack window even if they aren't
reachable. What I hadn't considered though are persistent subobjects.
Dealing with those would make it more complex in the general pack.

> Other issues:  When would the gc code be invoked?  Is it safe to
> invoke the gc code from app code?  

Pack can be invoked from app code, I think it just forceably prevents
concurrent runs using a lock.

> This is always a bitch.  How do we prevent a
> *real* pack from hosing our sessions?

When would a *real* pack happen? Aren't packs specific to a storage? IOW
packaging the main storage doesn't pack mounted storages AFAIK.

If the user actually went to the control panel and packed the temporary
storage, well they just lost their sessions if they pack to 0. Maybe
this is a feature ;^)
 
> > This would eliminate having to keep track of transient objects in a
> > separate persistent data structure, which would also eliminate that
> > conflict hot-spot.
> 
> Yup.
> 
> > Persistent weak refs were also recently added to ZODB. These could
> > be used to refer to transient objects from non-transient ones
> > without affecting their lifetime.
> 
> Don't think we'd need that if we just relied on modtime and object
> type.

Yeah maybe not. I was thinking more about non-session transient objects
that you can actually browse to through the ZMI.

-Casey



More information about the ZODB-Dev mailing list