[ZODB-Dev] Using zodb and blobs

Nitro nitro at dr-code.org
Tue Apr 13 18:39:28 EDT 2010


Hello Tres,

thanks for your detailed answers!

Am 12.04.2010, 22:42 Uhr, schrieb Tres Seaver <tseaver at palladion.com>:

>> Additionally I made some quick performance tests. I committed 1kb sized
>> objects and I can do about 40 transaction/s if one object is changed per
>> transaction. For 100kb objects it's also around 40 transactions/s. Only
>> for object sizes bigger than that the raw I/O throughput seems to start  
>> to
>> matter.
>
> 40 tps sounds low:  are you pushing blob content over the wire somehow?

No, that test was with a plain file storage. Just a plain Persistent  
object with a differently sized string and an integer attribute. I did  
something like

1) create object with attribute x (integer) and y (variably sized string)
2) for i in range(100): obj.x = i; transaction.commit()
3) Measure time taken for step 2

>> Still don't know the answers to these:
>>
>> - Does it make sense to use ZODB in this scenario? My data is not suited
>> well for an RDBMS.
>
> YMMV.  I still default to using ZODB for anything at all, unless the
> problem smells very strongly relational.

Ok, the problem at hand certainly doesn't smell relational. It is more  
about storing lots of different data than querying it extensively. It's a  
mixture of digital asset management (the blobs are useful for this part)  
and "projects" which reference the assets. The projects are shared between  
the clients and will consist of a big tree with Persistent objects hooked  
up to it.

>> - Are there more complications to blobs other than a slightly different
>> backup procedure?
>
> You need to think about how the blob data is shared between ZEO clients
> (your appserver) and the ZEO storage server:  opinions vary here, but I
> would prefer to have the blobs living in a writable shared filesystem,
> in order to avoid the necessity of fetching their data over ZEO on the
> individual clients which were not the one "pushing" the blob into the
> database.

The zeo server and clients will be in different physical locations, so I'd  
probably have to employ some shared filesystem which can deal with that.  
Speaking of locations of server and clients, is it a problem - as in zeo  
will perform very badly under these circumstances as it was not designed  
for this - if they are not in the same location (typical latency 0-100ms)?

>> - Are there any performance penalties by using very large invalidation
>> queues (i.e. 300,000 objects) to reduce client cache verification time?
>
> At a minimum, RAM occupied by that queue might be better used elsewhere.
>  I just don't use persistent caches, and tend to reboot appservers in
> rotation after the ZEO storage has been down for any significant period
> (almost never happens).

In my case the clients might be down for a couple of days (typically 1 or  
2 days) and they should not spend 30 mins in cache verification time each  
time they reconnect. So if these 300k objects take up 1k each, then they  
occupy 300 MB of ram which I am fine with.

>>  From what I've read it only seems to consume memory.
>
> Note that the ZEO storage server makes copies of that queue to avoid
> race conditions.

Ok, I can see how copying and storing 300k objects is slow and can take up  
excessive amounts of memory.

Thanks,
-Matthias


More information about the ZODB-Dev mailing list