[ZODB-Dev] RFC: Blobs in S3

Jim Fulton jim at zope.com
Thu Jul 7 11:55:31 EDT 2011


On Thu, Jul 7, 2011 at 10:49 AM, Laurence Rowe <l at lrowe.co.uk> wrote:
...
> One thing I found with my (rather naive) experiments building
> s3storage a few years ago is that you need to ensure requests to S3
> are made in parallel to get reasonable performance. This would be a
> lesser problem with blobs, but even then you might have multiple file
> uploads in the same request. The boto library is really useful, but
> doesn't support async requests.

Right, it occurred to me that commit performance with s3 might be an issue.

> I guess the simplest implementation would only upload a blob to S3 in
> tpc_begin as that is where the tid is set (and presumably the tid will
> form part of the blob's S3 url.) With large files that might make
> tpc_begin take a long time to complete as it waits for the blob data
> to be loaded into S3. It might be better to upload large blobs to a
> temporary s3 url first and then only make an S3 copy in tpc_begin,
> you'd need to do some benchmarks to see if this was worthwhile for all
> files or only files over a certain size.

I think I get where you're going, although I'd quibble with the details.
There is certainly some opportunity for doing things in parallel
up until you get to tpc_vote. I wonder if renames in S3 take much
time. I can image that they do.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton


More information about the ZODB-Dev mailing list