[ZODB-Dev] Using zodb and blobs

Tue Apr 13 19:58:11 EDT 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nitro wrote:
> Hello Tres,
> 
> thanks for your detailed answers!
> 
> Am 12.04.2010, 22:42 Uhr, schrieb Tres Seaver <tseaver at palladion.com>:
> 
>>> Additionally I made some quick performance tests. I committed 1kb sized
>>> objects and I can do about 40 transaction/s if one object is changed per
>>> transaction. For 100kb objects it's also around 40 transactions/s. Only
>>> for object sizes bigger than that the raw I/O throughput seems to start  
>>> to
>>> matter.
>> 40 tps sounds low:  are you pushing blob content over the wire somehow?
> 
> No, that test was with a plain file storage. Just a plain Persistent  
> object with a differently sized string and an integer attribute. I did  
> something like
> 
> 1) create object with attribute x (integer) and y (variably sized string)
> 2) for i in range(100): obj.x = i; transaction.commit()
> 3) Measure time taken for step 2
> 
>>> Still don't know the answers to these:
>>>
>>> - Does it make sense to use ZODB in this scenario? My data is not suited
>>> well for an RDBMS.
>> YMMV.  I still default to using ZODB for anything at all, unless the
>> problem smells very strongly relational.
> 
> Ok, the problem at hand certainly doesn't smell relational. It is more  
> about storing lots of different data than querying it extensively. It's a  
> mixture of digital asset management (the blobs are useful for this part)  
> and "projects" which reference the assets. The projects are shared between  
> the clients and will consist of a big tree with Persistent objects hooked  
> up to it.

I have seen the ZEO storage committing transactions at least an order of
magnitude faster than that (e.g., when processing incoming newswire
feeds).  I would guess that there could have been some other latencies
involved in your setup (e.g., that 0-100ms lag you mention below).

>>> - Are there more complications to blobs other than a slightly different
>>> backup procedure?
>> You need to think about how the blob data is shared between ZEO clients
>> (your appserver) and the ZEO storage server:  opinions vary here, but I
>> would prefer to have the blobs living in a writable shared filesystem,
>> in order to avoid the necessity of fetching their data over ZEO on the
>> individual clients which were not the one "pushing" the blob into the
>> database.
> 
> The zeo server and clients will be in different physical locations, so I'd  
> probably have to employ some shared filesystem which can deal with that.  
> Speaking of locations of server and clients, is it a problem - as in zeo  
> will perform very badly under these circumstances as it was not designed  
> for this - if they are not in the same location (typical latency 0-100ms)?

That depends on the mix of reads and writes in your application.  I have
personnally witnessed a case where the clients stayed up and serving
pages over a whole weekend in a clusterfsck where both the ZEO server
and the monitoring infrastructure went belly up.  This was for a large
corporate intranet, in case that helps:  the problem surfaced
mid-morning on Monday when the employee in charge of updating the lunch
menu for the week couldn't save the changes.

>>> - Are there any performance penalties by using very large invalidation
>>> queues (i.e. 300,000 objects) to reduce client cache verification time?
>> At a minimum, RAM occupied by that queue might be better used elsewhere.
>>  I just don't use persistent caches, and tend to reboot appservers in
>> rotation after the ZEO storage has been down for any significant period
>> (almost never happens).
> 
> In my case the clients might be down for a couple of days (typically 1 or  
> 2 days) and they should not spend 30 mins in cache verification time each  
> time they reconnect. So if these 300k objects take up 1k each, then they  
> occupy 300 MB of ram which I am fine with.

If the client is disconnected for any period of time, it is far more
likely that just dumping the cache and starting over fresh will be a
win.  The 'invalidation_queue' is primarily to support clients which
remain up while the storage server is down or unreachable.

>>>  From what I've read it only seems to consume memory.
>> Note that the ZEO storage server makes copies of that queue to avoid
>> race conditions.
> 
> Ok, I can see how copying and storing 300k objects is slow and can take up  
> excessive amounts of memory.

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkvFBRMACgkQ+gerLs4ltQ6D2QCeNJujDxrJ0cGxkzPH4tMfcE+r
t9IAoIj0J7f4DXGiNUdQ8nVXA4eAWQYT
=7Dsq
-----END PGP SIGNATURE-----