[ZODB-Dev] RFC: Blobs in S3

Thu Jul 7 09:13:11 EDT 2011

On 6 July 2011 19:44, Jim Fulton <jim at zope.com> wrote:
> We're evaluating AWS for some of our applications and I'm thinking of adding
> some options to support using S3 to store Blobs:
>
> 1. Allow a storage in a ZEO storage server to store Blobs in S3.
>    This would probably be through some sort of abstraction to make
>    this not actually depend on S3.  It would likely leverage the fact that
>    a storage server's interaction with blobs is more limited than application
>    code.
>
> 2. Extend blob objects to provide an optional URL to fetch data
>    from. This would allow applications to provide S3 (or similar service)
>    URLs for blobs, rather than serving blob data themselves.
>
>
>    2.1 If I did this I think I'd also add a blob size property, so you could
>          get a blob's size without opening the blob file or downloading
>          it from a database server.
>
> Option 3.  Handle blob URLs at the application level.
>
>   To make this work for the S3 case, I think we'd have to use  a
>   ZEO server connection to be called by application code.  Something like:
>
>       self.blob = ZODB.blob.Blob()
>       f = self.blob.open('w')
>       f.write(some_data)
>
>
> Option 1 is fairly straightforward, and low risk.
>
> Option 2 is much trickier:
>
> - It's an API change
> - There are bits of implementation that depend on the
>  current blob record format.  I'm not sure if these
>  bits extend beyond the ZODB code base.
> - The handling of blob object state would be a little
>   delicate, since some of the state would be set on the storage
>   server.
> -  The win depends on being able to load a blob
>    file independently of loading blob objects, although
>    the ZEO blob cache implementation already depends
>    on this.

Adding the ability to store blobs in S3 would be an excellent feature
for AWS based deployments. I'm not convinced that presenting S3 urls
to the end users is terribly useful as there is no ability to set a
Content-Disposition header and the url will not end with the correct
file extension, which will cause problems for users downloading files.

I would imagine a more common setup would be to serve the S3 stored
blobs through a proxy server running in EC2, using something similar
to Nginx's X-Accel-Redirect. Lovely Systems has some information on
generating an S3 Authrorization header in Nginx here:
http://www.lovelysystems.com/nginx-as-an-amazon-s3-authentication-proxy-2/
- though generating an authenticated S3 URL in Python to set in the
X-Accel-Redirect header would lead to much simpler proxy
configuration.

In either case though, I don't see why doing so would necessitate
changing the blob record format - presumably a blob's url can be
simply mapped from the S3 blobstorage configuration and a blob's oid
and tid?

Laurence