Python on multi-processor machines (Was Re: [Zope] Re: Windows vs. Linux)

Matthew T. Kromer matt@zope.com
Thu, 29 Aug 2002 15:06:23 -0400


Dennis Allison wrote:

>  
>
>>    
>>
>
>That is precisely the configuration I run without problem.
>
>I have not (yet) looked at the Python code, but I am reasonably sure my
>intuition is correct.  (Matt or Guido -- correct me if I am wrong...)
>
>First, safety is not an issue modulo thread safety in the uniprocessor
>machine and the correctness of the SMP implementation. Multiple threads
>allocated to different processors function correctly.  The problem is with
>performance since the GIL serializes everything and blocks all processors,
>not just the processor on which the thread is running.  This means that
>the second processor does not contribute to the execution as it could, so
>the effective CPU available is closer to 1.0 than 2.0.
>  
>

Well, in worst case, it can actually give you performance UNDER 1X.  The 
latency switching the GIL between CPUs comes right off your ability to 
do work in a quanta.  If you have a 1 gigahertz machine capable of doing 
12,000 pystones of work, and it takes 50 milliseconds to switch the 
GIL(I dont know how long it takes, this is an example) you would lose 5% 
of your peak performance for *EACH* GIL switch.  Setting 
sys.setchechinterval(240) will still yield the GIL 50 times a second. 
 If the GIL actually migrates only 10% of the time its released, that 
would 50 * .1  * 5% = 25% performance loss.  The cost to switch the GIL 
is going to vary, but will probably range between .1 and .9 time quantas 
(scheduler time intervals) and a typical time quanta is 5 to 10ms.

The 'saving grace' of the linux scheduler is that when a thread gives up 
the GIL, it almost immediately gets it back again, rather than having 
another thread acquire it.  This is bad for average response time, but 
good for throughput -- it means the threads waiting on the GIL are woken 
up, but will fail to get the GIL and go back to sleep again.

However, I have directly observed a 30% penalty under MP constraints 
when the sys.setcheckinterval value was too low (and there was too much 
GIL thrashing).

Very little in Zope is capable of releasing the GIL and doing work 
independantly; some of the database adapters can do that but that 
ususally does not represent a large number.  Curious side remark:  when 
you have a LARGE number of threads, you usually do not have enough 
database threads!  The number of database threads is a default parameter 
to an initialization method, and is set to 7.  When you DO actually have 
lots of concurrent work occuring without GIL thrashing, you need to bump 
up the number of Zope database threads.  Sites that do a lot of XML-RPC 
or other high latency I/O (network IO needed to fulfill a request, not 
just send back the response) usually need to bump up the number of 
database threads.  Otherwise, they block waiting on a database thread in 
Zope, which is bad.