[ZODB-Dev] ZEO and relstporage performance

Jim Fulton jim at zope.com
Tue Oct 13 17:08:07 EDT 2009


I've been working on a project to speed up ZEO.  The speedup mainly
involves getting ZEO to use more threads by giving each client it's
own thread, and changing FileStorage to allow multiple simultaneous
readers.  This is especially valuable for us (ZC) for large databases
(~1TB) running on multi-splindle storage systems on which multiple
reads of the same file can take place in parallel.  I'll have more to
say about this work in later posts.

In the course of working on this, I decided to play with Shane's
relstorage benchmark, speedtest.  After playing with it a bit, I have
a few observations.

- Up to a point, it does a good job of isolating just the networking
  aspects of the mysql and ZEO protocols:

  - It uses a small enough data set to fit in ram, so the read portion
    of the tests does no disk IO.

  - It doesn't leverage ZODB or ZEO caches at all. (Although ZEO read
    times are penalized by the time taken to write to the ZEO cache
    locally.)

- The tests run clients and servers on the same machine using Unix
  Domain Sockets for communication (at least for ZEO and MySQL).
  Generally, at least in deployments we do, the clients and servers
  run on different machines.

- When running at high concurrency levels, the clients and server can
  compete for CPU recourses, distorting results.  This wouldn't happen
  of the clients ran on separate machines.

- Minor nit: the tests notion of object's per transaction is off. The
  actual number reported is on the order of 1/30 of the numbers the
  numbers reported by the tests.

I decided to explore this a bit.  I modified shanes speedtest script
on a branch:

- Added command line options to control a number of factors, like
  object sizes and concurrency levels.

- Added options to specify mysql connection parameters.  Among other
  things, this lets me run the test in a "remote" configuration, in
  which the client and server are on different machines.

- Added an option to specify a ZEO TCP address and to manage a ZEO
  server externally.

- Replaced the single read measurement with "cold", "hot" and "steamin"
  measurements. The "cold" number is what Shane's test originally called
  "read".  It reads data from the server without benefit of the ZODB
  or ZEO caches.

  The "hot" number provides timings for a second round of reads
  after minimizing the object cache.

  The "steamin" number is the timing of a 3rd round of reads without
  clearing the ZODB cache. I upped the size of the ZODB cache to make
  sure the objects woould fit.

Here are some results.  I'm going to provide them in tabular form, as
I actually find this easier than charts for this data and also because
it's less work. :) The results below are basically as output by his
script with my modifications.

First, here are results from running clients and server on the same
machine using unix domain sockets.  The results are grouped onto 3
tables based on objects per transaction.  Note that for the second and
third tables I've added the actual object counts. The machine these
were run on was a 2.2Ghz Intel Core 2 Duo (two core) desktop with a
SATA disk and 4GB of ram and running Ubuntu 9.04.  They used
relstorage trunk as of October 5, when I made by branch and using ZODB
3.9.1.  The results also reflect the default relstorage poll interval
of 0.  More on that later.  The results also reflect mysql
configured to improve write performance as described here:
http://shane.willowrise.com/archives/how-to-fix-the-mysql-write-speed/.

The first column is the concurrency level, which is the number of
simultaneous clients.  The remaining columns are in 2 groups of 4, for
ZEO and for MySQLAdapter (reslstorage+mysql).  Each group has a write
time, a cold read time, a hot read time (second set of reads after
clearing the ZODB objects cache) and a steamin time based on a 3rd set
of reads without clearing the object cache.


Columns:
"Concurrency",
 ZEO + FileStorage - write,
 ZEO + FileStorage - cold,
 ZEO + FileStorage - hot,
 ZEO + FileStorage - steamin,
 MySQLAdapter - write,
 MySQLAdapter - cold,
 MySQLAdapter - hot,
 MySQLAdapter - steamin


Local clients, poll interval 0
==============================

** Results with objects_per_txn=1 **
   ZEO+FS --------------------------   MySQL-----------------------------
   write    cold     hot      steamin  write    cold     hot      steamin
1, 0.00992, 0.00108, 0.00015, 0.00007, 0.00405, 0.00129, 0.00076, 0.00043
2, 0.01359, 0.00177, 0.00024, 0.00011, 0.00635, 0.00083, 0.00043, 0.00024
4, 0.02322, 0.00226, 0.00025, 0.00011, 0.00836, 0.00128, 0.00047, 0.00025
8, 0.07687, 0.00183, 0.00020, 0.00009, 0.01236, 0.00121, 0.00055, 0.00036
16, 0.25414, 0.00259, 0.00018, 0.00007, 0.02846, 0.00130, 0.00056, 0.00032

** Results with objects_per_txn=100 (REALLY 4) **
   ZEO+FS --------------------------   MySQL-----------------------------
   write    cold     hot      steamin  write    cold     hot      steamin
1, 0.01352, 0.00574, 0.00062, 0.00017, 0.00841, 0.00273, 0.00159, 0.00043
2, 0.02414, 0.00539, 0.00035, 0.00008, 0.00678, 0.00292, 0.00202, 0.00045
4, 0.03136, 0.00789, 0.00035, 0.00007, 0.01343, 0.00198, 0.00108, 0.00025
8, 0.09697, 0.00694, 0.00036, 0.00008, 0.01910, 0.00253, 0.00111, 0.00025
16, 0.24361, 0.01369, 0.00037, 0.00008, 0.03413, 0.00363, 0.00158, 0.00036

** Results with objects_per_txn=10000 (REALLY 334) **
   ZEO+FS --------------------------   MySQL-----------------------------
   write    cold     hot      steamin  write    cold     hot      steamin
1, 0.13877, 0.40306, 0.02324, 0.00042, 0.11370, 0.09461, 0.05026, 0.00063
2, 0.18004, 0.39529, 0.02051, 0.00045, 0.12573, 0.10313, 0.07746, 0.00072
4, 0.36065, 0.38792, 0.02192, 0.00050, 0.25860, 0.21972, 0.14529, 0.00150
8, 0.68353, 1.57573, 0.02679, 0.00110, 0.51280, 0.44516, 0.45004, 0.00126
16, 1.46470, 3.40687, 0.03225, 0.00057, 1.00606, 1.03924, 1.29605, 0.00102

As you can see, write and cold read times are quite a bit higher for
ZEO, although write times get closer together as transaction size and
concurrency increases.

Also note that the hot times are much lower for ZEO than with MySQLAdapter.
Our ZEO cache hit rates are typically around 90%.  With a cache hot
rate of only 75% I'd expect ZEO+FS to generally outperform MySQLAdapter.

The steamin times are also quite a bit lower for ZEO+FS that for
mysql.  This is a it surprising since data are simply being read from
the ZODB object cache, but the overhead of polling for changes slows
down these accesses.  Ideally, ZEO OBject cache hit rates are high, so
the steamin times are highly relevent to actual application
performance.

I shared this data with Shane who suggested running with a poll
interval of 2.  Here are the results with a poll interval of 2.

Local clients, poll interval 2
==============================

** Results with objects_per_txn=1 **
1, 0.00920, 0.00163, 0.00024, 0.00011, 0.00419, 0.00102, 0.00050, 0.00015
2, 0.01381, 0.00143, 0.00021, 0.00010, 0.00425, 0.00110, 0.00057, 0.00015
4, 0.03010, 0.00153, 0.00015, 0.00007, 0.00505, 0.00123, 0.00051, 0.00013
8, 0.06913, 0.00145, 0.00017, 0.00008, 0.01171, 0.00127, 0.00038, 0.00008
16, 0.21394, 0.00308, 0.00017, 0.00007, 0.02466, 0.00225, 0.00037, 0.00008

** Results with objects_per_txn=100 (REALLY 4) **
1, 0.01582, 0.00571, 0.00066, 0.00013, 0.00532, 0.00249, 0.00131, 0.00015
2, 0.01774, 0.00612, 0.00062, 0.00013, 0.00704, 0.00244, 0.00098, 0.00009
4, 0.02779, 0.00710, 0.00055, 0.00012, 0.00741, 0.00384, 0.00143, 0.00009
8, 0.08021, 0.01067, 0.00035, 0.00007, 0.01639, 0.00323, 0.00100, 0.00009
16, 0.26911, 0.01602, 0.00038, 0.00007, 0.03164, 0.00462, 0.00101, 0.00009

** Results with objects_per_txn=10000 (REALLY 334) **
1, 0.16153, 0.40147, 0.02417, 0.00042, 0.11959, 0.10012, 0.05048, 0.00045
2, 0.18652, 0.39361, 0.02055, 0.00044, 0.12947, 0.10604, 0.08080, 0.00047
4, 0.33065, 0.84091, 0.02331, 0.00050, 0.25859, 0.21675, 0.13139, 0.00052
8, 0.67337, 1.46541, 0.02905, 0.00069, 0.49674, 0.42905, 0.44064, 0.00063
16, 1.46586, 3.67101, 0.03427, 0.00097, 0.99446, 1.06484, 1.16689, 0.00078

Here the steamin times are are very similar for ZEO and MySQLAdapter,
although the ZEO+FS times are a bit lower.  Note however, that using a
poll interval of 2 may cause excessive conflict errors, especially if
there are relatively hot objects that get updated a lot.

In our deployments, the clients are on separate machines and generally
don't compete with each other or with each other for CPU resources.
The tables blow show results with clients running on a separate 8-core
2.33Ghz Xeon (dual quad core) machine with 24G of memory and running
Centos 4.7.  There was plenty of CPU resources for the clients so they
never came close to using all of the available CPU resources.

Remote clients, poll interval 2
==============================

** Results with objects_per_txn=1 **
1, 0.03733, 0.00207, 0.00015, 0.00007, 0.01905, 0.00240, 0.00141, 0.00008
2, 0.01772, 0.00233, 0.00015, 0.00007, 0.01962, 0.00240, 0.00147, 0.00008
4, 0.06634, 0.00236, 0.00015, 0.00007, 0.03471, 0.00262, 0.00162, 0.00008
8, 0.08080, 0.00364, 0.00016, 0.00007, 0.06410, 0.00287, 0.00164, 0.00008
16, 0.09270, 0.00440, 0.00016, 0.00007, 0.13171, 0.00316, 0.00174, 0.00009

** Results with objects_per_txn=100 (REALLY 4) **
1, 0.01809, 0.00683, 0.00034, 0.00007, 0.02432, 0.00597, 0.00480, 0.00008
2, 0.02210, 0.00816, 0.00034, 0.00007, 0.02873, 0.00645, 0.00513, 0.00008
4, 0.07079, 0.00991, 0.00036, 0.00007, 0.03521, 0.00655, 0.00520, 0.00009
8, 0.08739, 0.01388, 0.00035, 0.00007, 0.06754, 0.00706, 0.00557, 0.00009
16, 0.09264, 0.01376, 0.00035, 0.00007, 0.13904, 0.00777, 0.00593, 0.00010

** Results with objects_per_txn=10000 (REALLY 334) **
1, 0.17738, 0.57640, 0.01969, 0.00038, 0.61835, 0.47054, 0.39015, 0.00041
2, 0.20881, 0.67896, 0.01973, 0.00038, 0.65081, 0.45832, 0.39691, 0.00043
4, 0.28996, 0.92163, 0.01993, 0.00038, 0.70280, 0.47962, 0.41136, 0.00044
8, 0.41571, 1.25167, 0.02008, 0.00040, 0.81672, 0.50079, 0.50144, 0.00045
16, 0.60316, 1.54352, 0.02033, 0.00039, 1.23906, 0.60130, 0.68200, 0.00049


Some things to note:

- For smaller transaction sizes, ZEO+FS and MySQLAdapter write times
  are pretty close, however at higher levels of concurrency or for
  large transaction sizes, ZEO+FS outperforms MySQLAdapter on writes.

- For smaller transaction sizes, ZEO+FS and MySQLAdapter cold read
  times are pretty close. Even for larger transaction sizes, the cold
  read times are pretty close, except at the highest concurrency
  level.  I think what's happening for high concurrency and large
  transaction sizes is that ZEO has reached maximum throughput and the
  MySQLAdapter still has some breathing room.

- The hot times are more than an order of magnitude better for
  ZEO+FS.

These benchmarks make ZEO+FS look pretty good relative to
MySQLAdapter.  The overall performance assuming even moderate;y
effective ZEO pr object caches is significantly better for ZEO.
Keep in mind, however, that these benchmarks don't take
disk access on the server into account for reads, because there isn't
any.  In practice, I'd expect server disk access times to dominate
cold read times.  For example, in a separate benchmark with far more
realistic access patterns against a large database, object load times
are an order of machnitude greater than what you'd see if the data
being read was all in RAM.

Jim

-- 
Jim Fulton


More information about the ZODB-Dev mailing list