[Zope-dev] Re: [Zope3-dev] proposal: serving static content faster

Chris McDonough chrism at plope.com
Wed Mar 24 15:05:55 EST 2004


On Wed, 2004-03-24 at 13:32, Shane Hathaway wrote:
> Chris McDonough wrote:
> > IMO code that needs to read from the database shouldn't return a
> > producer.  Instead, it should probably continue using the RESPONSE.write
> > streaming protocol in the worker thread when it needs to do
> > producer-like things.  Returning a producer to ZPublisher seems to only
> > really useful when the producer's "more" generator is guaranteed to be
> > exceedingly cheap because as you noted it is meant to operate in the
> > context of the main thread.
> 
> I'll note that iterators probably ought to replace producers.  Just 
> spell "more" as "next" and they look pretty much the same.

I did consider that, but since the idea was to make it as fast as
possible, I figured we'd just return something that medusa could deal
with directly.  But since medusa doesn't know beans about StopIteration
coming out of an iterator, we can't just alias "more" to "next" and
expect it to work, at least without changing medusa.  But maybe that's
the right thing to do anyway (medusa is pretty overdue for some spring
cleaning) , or maybe we just wrap the iterator up in something medusa
currently understands.  It doesn't matter to me either way, really.

> > The time spent waiting for the code that accessed the database would
> > block all other asyncore operations, though, right?  We'd need to test
> > it, but I suspect it might be a net lose for the "multiple requests for
> > the same object" case because the overhead of reading from the database
> > cache would be realized serially for each request.
> 
> Look at it this way:
> 
> - Don't ghostify anything manually.  Let ZODB handle that.
>
> - Use a larger ZODB cache for the main thread's connection than you do 
> for the other connections, to increase the chance that objects will be 
> served directly from RAM.
> 
> - As long as other threads aren't reading/writing the large objects, 
> there will be at most one copy of a large object in memory at any given 
> time.
> 
> - Periodically ask the connection to collect garbage.  It uses a LRU 
> strategy, which seems much more optimal than immediate deactivation.

OK.  I'll let you handle that. ;-)

> >  And if the object
> > isn't in cache, it could potentially block for quite a long time.
> > That said, I dunno.  Do you think it might be a win?  I guess my worry
> > is that the the operation of producer should be  more or less
> > "guaranteed" to be cheap and it seems hard to make that promise about
> > ZODB access, especially as the data might be coming over the wire from
> > ZEO.
> 
> If the object is not loaded and not in the ZEO cache, the producer could 
> say it's not ready yet and ask ZEO to fetch it in the background. 

Right.  We'd need to come up with a protocol that lets the producer
return "not ready yet".  I suppose this could just be implemented as an
exception.

> Jeremy has suggested that object pre-fetching could be added to ZODB.

I'll let you handle that too. ;-)

> > FWIW, Jim intimated a while back that he might be interested in
> > providing "blob" support directly within ZODB. I can imagine an
> > implementation of this where maybe you can mark an object as
> > "blobifiable" and when you do so, the ZODB caching code writes a copy of
> > that object into a named file on disk during normal operations
> > <hand-waving goes here ;->  Then we could use a producer to spool the
> > file data out without ever actually reading data out of a database from
> > a ZODB connection; we'd just ask the connection for the filename.
> 
> That's a possibility, although it would complicate the storage, and 
> making it work with ZEO would require a distributed filesystem.

It would actually complicate the ZODB connection caching code but the
storage would have really nothing to do with it.  It also wouldn't
require a distributed filesystem, because all we'd be doing is storing
cached copies of the data on the local disk of each ZEO client.  An
implementation could go something like this:

Objects that want to participate in the blob caching scheme can
implement a "_p_makeBlob" method (or whatever), which returns an
iterator representing the serialized data stream.

When a request for an object is provided to the connection:

- if it is not in the ZODB cache, return a ghost like normal.
- if it is in the cache and it has a "_p_makeBlob" method,
  check if a file on disk exists with its oid.  if a file
  doesn't exist on disk, call _p_makeBlob and create the file
  using the iterator it returns.  set _p_blob_filename on the
  object to the filename of the file created.
- App code can now use check for _p_blob_filename to see if
  a cached copy representing the serialized data exists on
  disk.  If it does, it can make use of it how it sees fit.
- when a cached object is invalidated out of the ZODB caches,
  delete the cached file too.

This happens on every ZEO client.  Solving race conditions and locking
is an exercise left to the reader. ;-)

- C





More information about the Zope-Dev mailing list