AW: [Zope] 2 zope faster 1 zope running

Matthew T. Kromer matt@zope.com
Mon, 17 Jun 2002 13:54:35 -0400


--Apple-Mail-1-932118704
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=ISO-8859-1;
	format=flowed

Copying Zope @ Zope.org since this is useful information.  My numbers=20
below are approximations, not hard figures.

Its derived from experimental observation.  A python bytecode, on=20
average, executes about 50 machine instructions.  You probably want to=20=

let a whole CPU quanta expire before voluntarily switching threads. =20
Generally a CPU quanta will be about 5 milliseconds.  A 1GHz pentium=20
will execute  about 1,000,000 instructions / millisecond, or about=20
100,000 python bytecodes / quanta.  The typical Zope publishing path is=20=

about 1,000,000 bytecodes or more -- so letting that path be interrupted=20=

10 times or more is overkill (for Zope).  Using my numbers you could=20
argue for a much higher ratio. (Ie, if you believe me, Zope "wants" a=20
sys.setcheckinterval(100000) on a 1Ghz machine.

 =46rom experimental observation I have detected a levelling off in =
benefit=20
at about pystones/50.  This becomes very noticable on a multiprocessor=20=

machine.  I believe the levelling off effect comes from other normal=20
'blocking' operations inside Zope which cause one thread to suspend. =20
Hence the factor of 500 discrepancy :)

The rationale is due to overhead in thread switching, and "thruput"=20
optimization.  Consider the following example:

Two threads wish to count from 1 to 10.  After each thread counts a=20
single digit, they switch.  A system clock is incremented after each=20
count:

Sys     Thr1    Thr2
1            1
2                        1
3            2
4                        2
...
19         10
20                     10

The average time for each thread to complete is 19 + 20 / 2, or 19.5.  =20=

Now consider the example where thread 1 is allowed to run to completion=20=

before thread 2:

Sys     Thr1     Th2
1            1
2            2
...
10          10
11                      1
...
20                      20

Here, the average time for each thread to complete is 10 + 20 / 2 or=20
15.  So, it costs 30% more work to let each thread run "concurrently"=20
without factoring in any overhead from the actual act of task switching,=20=

which in my example was zero, but can never actually be zero.

By increasing sys.setcheckinterval (the default Python value is 10!) we=20=

allow more work to be done by each thread before it yields control to=20
another thread.  The astute observer would also be able to note that the=20=

total system work for CPU BOUND processes can never exceed the speed of=20=

serial processing.  Because Zope is primarily CPU bound, fewer threads=20=

tend to be better.

I believe that a corollary to this is the effect people observe when=20
Zope undergoes "superlinear" degredation -- ie, too many things get=20
caught up in Zope (because too many threads are started).  I am sure=20
this isn't the *only* reason that happens (I dont have a good=20
observation suite to analyze it).  However, once internal queues for=20
work build up in Zope, they are very difficult to dissipate -- you have=20=

to have a substantial lessening in the work arrival rate.

N.B. If you use my figure of 1,000,000 bytecodes as a predictor of the=20=

Zope publishing path, you'll realize that this is about 5 cpu quanta=20
(again using a quanta of 5ms) on a 1Ghz machine which is a Zope=20
publishing rate of about 40 pages/sec.  For some applications this is an=20=

optimistic value.  For others, Zope can publish at a faster rate.  This=20=

is not intended to cover ALL applications, just a 'good guess' at one. =20=

I suggest running 'ab' or similar against a representative sample of=20
YOUR applications pages to convert pages/sec into a guesstimate of the=20=

"cost" of your application.

On Monday, June 17, 2002, at 10:05 AM, oliver.erlewein@sqs.de wrote:

> Hi
> =A0
> I've set my new interval from "-i 32" to "-i 200" as my Pystones is=20
> about 11000. I'll check what changes I will see. Where did you get =
that=20
> ratio from or why is it so?
> =A0

--Apple-Mail-1-932118704
Content-Transfer-Encoding: quoted-printable
Content-Type: text/enriched;
	charset=ISO-8859-1

Copying Zope @ Zope.org since this is useful information.  My numbers
below are approximations, not hard figures.


Its derived from experimental observation.  A python bytecode, on
average, executes about 50 machine instructions.  You probably want to
let a whole CPU quanta expire before voluntarily switching threads.=20
Generally a CPU quanta will be about 5 milliseconds.  A 1GHz pentium
will execute  about 1,000,000 instructions / millisecond, or about
100,000 python bytecodes / quanta.  The typical Zope publishing path
is about 1,000,000 bytecodes or more -- so letting that path be
interrupted 10 times or more is overkill (for Zope).  Using my numbers
you could argue for a much higher ratio. (Ie, if you believe me, Zope
"wants" a sys.setcheckinterval(100000) on a 1Ghz machine. =20


=46rom experimental observation I have detected a levelling off in
benefit at about pystones/50.  This becomes very noticable on a
multiprocessor machine.  I believe the levelling off effect comes from
other normal 'blocking' operations inside Zope which cause one thread
to suspend.  Hence the factor of 500 discrepancy :)


The rationale is due to overhead in thread switching, and "thruput"
optimization.  Consider the following example:


Two threads wish to count from 1 to 10.  After each thread counts a
single digit, they switch.  A system clock is incremented after each
count:


Sys     Thr1    Thr2

1            1

2                        1

3            2

4                        2

...

19         10

20                     10


The average time for each thread to complete is 19 + 20 / 2, or 19.5. =20=

Now consider the example where thread 1 is allowed to run to
completion before thread 2:


Sys     Thr1     Th2

1            1

2            2

...

10          10

11                      1

...

20                      20


Here, the average time for each thread to complete is 10 + 20 / 2 or
15.  So, it costs 30% more work to let each thread run "concurrently"
without factoring in any overhead from the actual act of task
switching, which in my example was zero, but can never actually be
zero.


By increasing sys.setcheckinterval (the default Python value is 10!)
we allow more work to be done by each thread before it yields control
to another thread.  The astute observer would also be able to note
that the total system work for CPU BOUND processes can never exceed
the speed of serial processing.  Because Zope is primarily CPU bound,
fewer threads tend to be better.


I believe that a corollary to this is the effect people observe when
Zope undergoes "superlinear" degredation -- ie, too many things get
caught up in Zope (because too many threads are started).  I am sure
this isn't the *only* reason that happens (I dont have a good
observation suite to analyze it).  However, once internal queues for
work build up in Zope, they are very difficult to dissipate -- you
have to have a substantial lessening in the work arrival rate.


N.B. If you use my figure of 1,000,000 bytecodes as a predictor of the
Zope publishing path, you'll realize that this is about 5 cpu quanta
(again using a quanta of 5ms) on a 1Ghz machine which is a Zope
publishing rate of about 40 pages/sec.  For some applications this is
an optimistic value.  For others, Zope can publish at a faster rate.=20
This is not intended to cover ALL applications, just a 'good guess' at
one.  I suggest running 'ab' or similar against a representative
sample of YOUR applications pages to convert pages/sec into a
guesstimate of the "cost" of your application.

 =20

On Monday, June 17, 2002, at 10:05 AM, oliver.erlewein@sqs.de wrote:


=
<excerpt><fontfamily><param>Arial</param><smaller>Hi</smaller></fontfamily=
>

=A0

<fontfamily><param>Arial</param><smaller>I've set my new interval from
"-i 32" to "-i 200" as my Pystones is about 11000. I'll check what
changes I will see. Where did you get that ratio from or why is it =
so?</smaller></fontfamily>

=A0

</excerpt>=

--Apple-Mail-1-932118704--