[Zope3-dev] Weird error within __cmp__ method of KeyReferenceToPersistent

Gary Poster gary at zope.com
Fri Aug 4 08:42:36 EDT 2006


On Aug 2, 2006, at 6:03 AM, Dominik Huber wrote:

> Hi Gary
>
> Thanks for your inputs. I was also offline this days...
>
> Gary Poster wrote:
> [...]
>> Well, first of all, I suspect your situation, based on what I've  
>> seen so far, looks something like this:
>>
>> - code creates obj P1
>> - code puts P1 in intids, with connection.
>> - code does something incorrect (either your code or code that you  
>> are relying on), probably involving a savepoint because of your  
>> discussion below, that keeps keyreference to P1 in intids BTree,  
>> but rolls back addition of P1 to connection.  Right now we have  
>> the bad data (a key reference to an object without a connection in  
>> the BTree), but we don't know about it yet.
>> - Now code creates obj P2.
>> - code tries to put P2 in intids.  If the BTree logic happens to  
>> make it compare against the bad keyreference to P1, you get the  
>> __cmp__ error.  This is a red herring for debugging: the problem  
>> has already occurred (see above).
> Yup.
>
> Few more details:
> There is a import handler which creates p1 , p2 to pn and sets  
> those object to a certain folder. After each addition of a certain  
> p an initializer subscriber listen to IObjectAddedEvent is invoked  
> too. The error occurs sometimes during this initialization. In case  
> of an error the creation and initialization of the error-prone p is  
> rolled back by an Importer (compare the following code):
>
> class Importer(object):
>    def __call__(self, importhandelr)
>          [...]
>           for d in import_reader:
>                if not d[first].startswith('#'):
>                    savepoint = transaction.savepoint()
>                    try:
>                        data = dict([(key, value for key, value in  
> d.items()])
>                        importhandler(data) # <- call importhandler  
> which creates the p
>                        transaction.commit()
>
>                    except Exception, e:
>                        savepoint.rollback() # clean up
>                        self.log.error(e)

Huh.  I wonder if a savepoint rollback cleans up sufficiently after a  
commit; I've never used it like that.  Out of curiosity, if you take  
the commit out does the problem go away?  Generally, looking at the  
code, one of the very first things that Transaction.commit does is  
_invalidate_all_savepoints, so the pattern you have here doesn't look  
advisable.

That said, if that were the problem here, it looks like you should  
have gotten an exception when you did the rollback, so it's probably  
just something to change, but not the cause of what we're talking about.

Also, in the vein of things that are probably not the cause of the  
problem we are discussing, but are maybe worth highlighting, the  
whole bare except thing around a commit is an interesting  
discussion.  For a one-time import, maybe this is fine, but for more  
robust code I prefer something a little more sophisticated.  I've  
tried various things over the years; my current recipe is something  
like this:

try:
     ...do stuff and commit...
except ZODB.POSException.TransactionError:
     ...abort and retry some number of times, and eventually raise...
except (SystemExit, KeyboardInterrupt, ZODB.POSException.POSError):
    ...abort and raise...
except:
     ...abort and log, or whatever is appropriate...

That's probably somewhat controversial--maybe TransactionError is too  
broad, for instance--but I offer it for what it's worth.

>> I'm pretty sure your BTree is already bad by the time you get to  
>> the failing __cmp__.
> Yup
>> The only way I know to fix it at this late stage in the story is  
>> drastic: you'll need to create a new BTree, iterate over the old  
>> one and add all the contents except for the broken keyreferences,  
>> and replace the old intid BTree with the new one.
>>
>> I don't think that's appropriate error catching for the BTree  
>> code.  I think if you get the error you describe, you have a  
>> serious problem in "your" code (i.e., code that you've written or  
>> code that you're relying on in your stack, like Zope).  How--and  
>> when--to handle it is application policy, not something that  
>> should be in the base intids, I think.
> Yes, but the whole problem seems not so heavy:
>
> 1. If I restart the server no wrong intid entries are there.

One possible explanation is that you have some in-memory data  
structure that the savepoint can't roll back.  Another is that the  
savepoint code isn't quite right.  A third is that the BTree code  
isn't quite right.  I suspect something like the first more than the  
second or third, but that doesn't rule anything out.  Have you  
analyzed what exception triggers the problem?  Have you analyzed the  
bad in-memory tree to see if it gives you any clues?

> 2. If I patch the __cmp__ method of KeyReferenceToPersistent the  
> proposed way it only runs right after a wrong p creation into the  
> except. After another successful transaction commit I don't have  
> any bad intids entries
>
> IMO that points that somehow the persistent data are ok. That we  
> only might have an 'old' volatile reference to a roled back intid  
> somewhere, that causes the error ... My problem is that I cannot  
> the error situation by a test case where intids are invoked.  
> Probably it's a problem of the order of invoked subscribers listen  
> to the object added event.

Maybe so, but I believe that you are going to *have* to produce a  
reliable test case to have a reliable fix.  This is too weird, and we  
don't even know what is implicated.

>>> The error occurs after a savepoint.rollback. I can reproduce the  
>>> error within the basic error story (-> see testcase). It happens  
>>> if we compare a commited persistent object with a rollbacked  
>>> persistent object.
>>>
>>> The current implementation hurts the demeter's law. A fresh  
>>> instance does not support the db attribute, even though the  
>>> situation showed within the basic error story
>>>
>>>    >>> from persistent import Persistent
>>>    >>> p = Persistent()
>>>    >>> p._p_jar.db().database_name
>>>    Traceback (most recent call last):
>>>    ...
>>>    AttributeError: 'NoneType' object has no attribute 'db'
>>
>> I remember knowing Demeter's law. ;-)  I should look it up, but  
>> I've already spent more time than I should have this morning.  The  
>> above behavior is expected.  The connection.add(obj) code in the  
>> persistent keyreference code gives the obj a _p_jar.
>>
>>> IMO the error should still be handled by KeyReferenceToPersistent  
>>> __cmp__ method.
>>
>> This is too late.  Anything at this stage in the game is going to  
>> be a dangerous cover-up of an already serious problem.
> Demeter says only long.dotted.name.calls() on references of an  
> object are leading to pretty instable code.

Ah, I see.  Not sure if I ever knew the "law" or not ;-) but I see  
the point.

> We have this problem in the current implementation as shown in the  
> test case. If we use rollback's we can produce an object state of a  
> persistent that does not allow to lookup p._p_jar.db 
> ().database_name. That means the current implementation relies on  
> an implicit interface assumption too.

Well, I agree that the current implementation relies on an implicit  
interface assumption, but we part ways after that.  The current  
implementation would be explicit if it documented that it relies on  
other persistent key reference implementations having an `object`  
attribute.  Maybe, in fact, the intent is that key_type_id is only  
associated with a single class--but this isn't a documented  
assertion.  The persistent.txt file does document that persistent key  
references must be to objects that have a connection, and so the  
`_p_jar` requirement in __cmp__ is reasonable, since the _p_jar is  
the connection.  This could maybe be better documented, in an  
interface perhaps; and, as I said in one of my first replies to this  
thread, __cmp__ could maybe raise a more informative error to make  
this edge case easier to identify.

>   >>> transaction.commit()
>   >>> savepoint = transaction.savepoint()
>   >>> p2 = P()
>   >>> p2.__parent__ = top
>   >>> keyref2 = KeyReferenceToPersistent(p2)
>   >>> cmp(keyref1, keyref2)
>   -1
>   >>> savepoint.rollback()
>   >>> cmp(keyref1, keyref2)
>   Traceback (most recent call last):
>   ...
>   AttributeError: 'NoneType' object has no attribute 'db'

Yes, you've showed this before.  This is to be expected.

> IMO it is a pretty self-punishment-strategy to use a __cmp__ method  
> to recognize corrupt intids.

:-) It's not ideal, but I don't have a better suggestion.  Something  
is going wrong in the stack you are using, and you are getting in an  
exceptional edge case.  The root cause needs to be addressed.  With  
an error like this, it is a horrible time to modify the BTree, so the  
__cmp__ can't gloss things over (see brief examples of "lies" below);  
and a __cmp__ method is a very impractical and inappropriate time to  
write to the database to fix things.

> Your aspect of corrupt intids is *very* important, but removing  
> those after an error-prone transaction would be much comfortable  
> with a working __cmp__ function. So I still propose the following  
> fix...
>
> class KeyReferenceToPersistent(object):
>   [...]
>   def __cmp__(self, other):
>       if self.key_type_id == other.key_type_id:
>           try:
>               return cmp(
>                   (self.object._p_jar.db().database_name,   
> self.object._p_oid),
>                   (other.object._p_jar.db().database_name,  
> other.object._p_oid),
>                   )
>           except AttributeError:
>               # a transaction might be rolled back
>               self_db = getattr(self.object._p_jar, 'db', '')
>               if self_db:
>                   self_db = self_db().database_name
>               other_db = getattr(other.object._p_jar, 'db', '')
>               if other_db:
>                   other_db = other_db().database_name
>
>               return cmp(
>                   (self_db,  self.object._p_oid),
>                   (other_db, other.object._p_oid),
>                   )

Again, like your other similar suggestion, this gives you the wrong  
answer--it lies.  In this case, if your code compares something that  
doesn't have a database, it assume the database name is '': the  
default name for the main Zope database.  Here are a few ways that  
can be a lie:

- the default database name is different.  When (if!) the object that  
has no connection is added to the database, it will sort differently  
than it does now: corrupt BTree.

- You have a multi-database.  The object is actually going to be  
added to a database with a different name.  When it is, it will sort  
differently: corrupt BTree.

- You have an insane situation in your BTree, involving in-memory  
data structures or savepoints or whatever else is causing the problem  
here, so that one or more key references in the BTree have objects  
without connections.  This glosses over the problem, causing...the  
unknown.  Possibly a corrupt BTree.  We don't know enough about  
what's going on for me to speculate too much.

If you don't ever use multiple databases, and you never change the  
default database name, you *might* be ok.  We use multiple  
databases.  We might change the default database name.  The change is  
too fragile for us.

(I'm sure this is frustrating. :-( You have my sympathy. :-) )

Gary


More information about the Zope3-dev mailing list