[ZODB-Dev] ZEO client hangs when combined with other asyncore code

Thu Jun 23 14:43:49 EDT 2005

...

[Tim Peters]
>> asyncore gives me a headache.

[Tony Meyer]
> I think this is true for any value of "me" <0.5 wink>.

Not Sam Rushing -- he's asyncore's dad.  He doesn't use threads, and, for
that matter, doesn't use asyncore anymore either (possibly because it isn't
confusing enough <wink>).  But an _appropriate_ single-threaded
asyncore-based wire-protocol implementation can be very easy to follow, and
scale to thousands of clients even on feeble HW.  It has its place.  It just
doesn't seem to be a place I usually go ...

[Tim, explains how ZEO's ThreadedAsync/LoopCallback.py monkey patches
 Python's asyncore.loop]

[Tony]
> Argh.  This explains a lot.  I couldn't understand why print statements
> in asyncore.loop didn't print, unless I renamed loop and called the
> renamed function (which would then have done bad things to ZEO, no
> doubt).  Nasty indeed :)

Yup!  It's hard to account for how many lost debugging hours this may have
cost various people, including that LoopCallback also took over asyncore's
poll() function (so debugging prints, breakpoints, etc in _that_ also "got
lost").  Yesterday I noticed that Python's asyncore.loop signature changed
in Python 2.4 (a `count` argument was added), so LoopCallback's replacement
is plain wrong for Python 2.4.  The reason for replacing asyncore.poll()
went away too, so I rewrote it all for ZODB 3.4.1 and 3.5 (neither released
yet).  It still replaces asyncore.loop, but with a wrapper that calls the
_original_ asyncore.loop "in the middle", so if anyone adds
prints/breakpoints/etc to Python's asyncore.loop in the future, they'll
still be effective despite ZEO's interference.

Sorry for the bother, but take comfort in knowing that whining about it
helped get it fixed <wink>.

>> If the flow is like this:
>>
>>   asyncore mainloop invokes POP3 proxy code
>>       POP3 proxy code makes a synchronous ZEO call
>>
>> then I figure the app may well hang then:  the thread running the
>> asyncore mainloop is still running a POP3 proxy callback, waiting for a
>> response that can never happen until the asyncore mainloop gets control
>> back (in order to send & receive ZEO messages).

> This was definitely the problem.  The easiest solution (partly because
> some of this work is already done <wink>), IMO, is to separate out the
> ZEO and asyncore-based proxy into separate asyncore maps and have two
> asyncore mainloop threads, one for each map.  This follows Tim's comment
> about ZEO expecting the asyncore loop to be in a separate thread, too.

Excellent!  That makes some sense.  ZEO _may_ change to do a similar thing,
but I need to find/make time to be sure of the details.  For historical
reasons, ZEO doesn't actually require an asyncore mainloop to be running,
and it replaces asyncore.loop precisely so it can get notified if "anyone
else" starts the asyncore loop.  If someone does, then ZEO kinda
reconfigures itself on the fly to exploit that an asyncore mainloop is
running.  I'm not clear on why ZEO didn't just start one itself (that's one
of the details I'm still unclear about).

Anyway, there's a lot of weird internal complexity there trying to live both
with and without asyncore, and to switch modes dynamically based on
monkey-patching and callbacks, and I suspect it would go pretty easily to
get rid of it all by having each ZEO client spin off a thread to run its
own, dedicated-to-it asyncore mainloop -- effectively doing all the time
what you've been provoked into doing by hand now.

Then it could stop monkey-patching Python's asyncore too.  Maybe that would
just be a step on the way to removing asyncore dependence entirely.

> Anyway, this appears to have fixed the problem.  Many thanks for the
> clues - you might not have understood why it was hanging, but your
> comments were enough to get it fixed anyway :)

Ya, that's par for the course <wink>.  I'm very glad you got unstuck with
relatively little pain!