[ZODB-Dev] Some interesting (to some:) numbers

Andreas Jung lists at zopyx.com
Tue May 11 14:19:07 EDT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adam GROSZER wrote:
> Hello Jim,
> 
> 
> 
> Tuesday, May 11, 2010, 4:46:46 PM, you wrote:
> 
> JF> On Tue, May 11, 2010 at 7:47 AM, Adam GROSZER <agroszer at gmail.com> wrote:
>>> Hello Jim,
>>>
>>> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>>>
>>> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER <agroszer at gmail.com> wrote:
>>>>> Hello Jim,
>>>>>
>>>>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>>>>>
>>>>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER <agroszer at gmail.com> wrote:
>>>>>>> Hello Jim,
>>>>>>>
>>>>>>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>>>>>>
>>>>>>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink <roel at fourdigits.nl> wrote:
>>>>>>>>> That's really interesting! Did you notice any issues performance wise, or
>>>>>>>>> didn't you check that yet?
>>>>>>> JF> I didn't check performance. I just iterated over a file storage file,
>>>>>>> JF> checking compressed and uncompressed pickle sizes.
>>>>>>>
>>>>>>> I'd say some checksum is then also needed to detect bit failures that
>>>>>>> mess up the compressed data.
>>>>> JF> Why?
>>>>>
>>>>> I think the gzip algo compresses to a bit-stream, where even one bit
>>>>> has an error the rest of the uncompressed data might be a total mess.
>>>>> If that one bit is relatively early in the stream it's fatal.
>>>>> Salvaging the data is not a joy either.
>>>>> I know at this level we should expect that the OS and any underlying
>>>>> infrastructure should provide error-free data or fail.
>>>>> Tho I've seen some magic situations where the file copied without
>>>>> error through a network, but at the end CRC check failed on it :-O
>>> JF> How would a checksum help?  All it would do is tell you your hosed.
>>> JF> It wouldn't make you any less hosed.
>>>
>>> Yes, but I would know why it's hosed.
> 
> JF> How so?  How would you know why it is hosed.
> 
> Because of data corruption in the compressed stream.
> 
> JF> Note BTW that the zlib format already includes a checksum.
> 
> JF>   http://www.faqs.org/rfcs/rfc1950.html
> 
> I missed that. Case closed then ;-) Sorry for the noise.

A zipped file is not different from other (binary) data stored within
the ZODB. Data corruption can occur always - zipped or not.

Side note: we implemented compression support as part of our CMS on the
application layer where we store large binary files as linked PData
chains. However we do not compress in any case - only for certain
content-types (it does not make sense to compress zip or jar files).
We also store a md5 hash for each object (and never had a corruption
issue so far).

Andreas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvpn5sACgkQCJIWIbr9KYy3nACfV1lo6FLX7xeiDRVRlsj64tSX
Xy4An31w7pY9K0wmIIUtIxzpFGRmv7GW
=HbjJ
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lists.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zodb-dev/attachments/20100511/5d9b29f2/attachment.vcf 


More information about the ZODB-Dev mailing list