[ZODB-Dev] ZEO 3.2 (Zope 2.7) ->3.6 (Zope 2.9) upgrading: Much slower startup due to cache file creation

Tue Apr 18 13:25:59 EDT 2006

[Gfeller, Martin]
> I'm further along the upgrade road and have found that starting up my
> app under ZEO is *much slower* than it used to be with Zope 2.7, >10
> minutes vs. <1 minute.
>
> I have relatively large temporary cache files

Beause you're comparing a pre-MVCC ZEO (3.2) with a post-MVCC ZEO
(3.6), you should be aware that their ZEO cache designs have little in
common.

> (generous enough to to avoid cache flips, even if I don't know the DB size
> beforehand),

That's an example:  the post-MVCC ZEO cache is a single file, and
there are no cache flips; flips are unique to the pre-MVCC two-file
ZEO cache design.  It's possible that a smaller cache file size would
work just as well for you in a post-MVCC ZEO.  If you never see a
cache flip in 3.2, then I believe you specified a cache file size at
least twice as large as you need to hold all your data.

> and that the extra time is spent in the following code (on both Windows 2000
> and Windows XP):
>
> ZEO.cache.FileCache.__init__, line 779ff, after the cache file is created:
>
>      # Make sure the OS really saves enough bytes for the file.
>      self.f.seek(self.maxsize - 1)
>      self.f.write('x')
>      self.f.truncate()
>
> This code seems to have been introduced between the mentioned versions.

That's true, although it misses that everything else in the cache
implementations is also different ;-)

> What is the reason for it?

It reserves enough disk space for the requested size at the start, to
simplify the rest of the code, and to eliminate "oops! out of disk
space!" as a possible ZEO cache error after the app gets going (if you
don't have enough disk space for the cache file size you ask for, you
find out upon trying to create a post-MVCC ZEO cache, not after your
app has been running for an hour or a week).

Alas, those lines take measurable time only on Windows, NT or later. 
It's specifically the self.f.write('x') line, where NT+ physically
writes 0 bytes into the whole of the rest of the file, while other
OSes (including pre-NT Windows) don't.  That's why it's (much) more
expensive on NT+.  It's the same reason Python's standard
test_largefile test takes much longer on NT+ than elsewhere --
Microsoft C's stdio doesn't have a "sparse file" concept, while
efficient spareness is builtin default file behavior, at the OS level,
on most other platforms.  Pre-NT Windows exposes whatever bytes
happened to be sitting on disk, which was a massive security hole; NT+
plugs that by physically overwriting all the "empty regions" in the
file; most other OSes don't overwrite, but keep track of which regions
are empty and supply "virtual zeroes" when you try to read from an
empty region.

> I would expect the OS to extend the file as needed, without an initial "reservation"?

The post-MVCC ZEO cache file is treated circularly:  whenever it hits
the end of the reserved space, it wraps around and starts overwriting
old data at the start of the file again.  I suppose it would be
possible to complicate the code to append to an initially empty file
until it hit the specified size or a system IO error (like out of disk
space), and then switch to "circular mode".  Likely to be delicate,
and creates other issues (like what to do if appending _does_ run out
of disk space).

> It *could* lessen fragmentation, but this then depends on the file system state.

It's not aiming at fragmentation, although it's certainly the case on
NTFS that appending a few thousand bytes at a time is far more likely
to yield a badly fragmented file than asking for all the space you're
going to use once at the start (well, unless free space is badly
fragmented to begin with).