[ZODB-Dev] Wrong blob file being returned (similar to https://mail.zope.org/pipermail/zodb-dev/2011-February/014067.html )

Wed Jul 13 07:57:49 EDT 2011

Hello Jim,

On 07/12/2011 07:28 PM, Jim Fulton wrote:
> On Tue, Jul 12, 2011 at 6:33 AM, steve<steve at lonetwin.net>  wrote:
>>  Hi,
>> [...snip...]
>>  a. our site is image heavy (36293 blob files) and the servers are behind a load
>>  balancer so in a single request to the web-app (a repoze.bfg site) we might even
>>  load collectively 20+ blobs from any of the 4 servers.
>>
>>  b. zeo connection string on the clients
>>  zodb_uri =
>>  zeo://xxx.xxxx.xxx.xxx:8886/?blob_dir=%(here)s/../var/blobs&shared_blob_dir=false&connection_pool_size=50&cache_size=1024MB&drop_cache_rather_verify=true
>>
>>  c. $ cat var/blobs/.layout
>>  zeocache
>>
>>  Any comments/suggestion on how to isolate and fix this problem would be appreciated.
>
> We have a number of large apps with multiple terabytes of blobs and a
> vaguely similar configuration. We haven't seen this sort of problem.
> One difference is that we set the blob cache size.  I don't suppose
> you're running of disk space?
>

No we aren't running out of disk space and I hadn't really thought about setting 
a limit to the blob cache size (I assumed there'd be a builtin default). In 
fact, I didn't know this was configurable since it isn't mentioned in the docs 
for repoze.zodbconn (which is what we use to connect to the db)[1]. Fortunately 
it appears like repoze.zodbconn just passes along the parameters it doesn't 
understand to the underlying ClientStorage connector. I shall try setting a 
limit (which instinctively seems like a good thing to do anyways to avoid 
original state and cache state from going out of sync).

> The only suggestion I have is to keep an eye on it and try to
> reporoduce the problem.

Yes, I shall attempt to do that on a dev instance of the app.

> I would think that if a request returns an
> incorrect Blob, it would continue to. If someone reports a bad blob,
> get the URL and see if you can reproduce by making the same request to
> each of the app servers, bypassing the load balencer.  If one server
> is being bad, you can remove it from the LB pool to debug it.
>
That's the problem. I'd assumed the same behavior. Unfortunately it appears like 
incorrect blobs aren't always returned for subsequent requests. I traced down 
one such request to a single server, removed it from the LB and checked the same 
URL but the request didn't give me an incorrect image. I wonder, would this also 
be dependent on the connection pool or the thread serving the request?

Anyways, thanks for your help, I shall test and report back if I find a reliable 
way to reproduce this.

cheers,
- steve

[1] http://docs.repoze.org/zodbconn/narr.html#zeo-uri-scheme
-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/