[ZODB-Dev] Some interesting (to some:) numbers

Adam GROSZER agroszer at gmail.com
Tue May 11 08:47:25 EDT 2010


Hello,

Tuesday, May 11, 2010, 1:59:17 PM, you wrote:

N> Am 11.05.2010, 13:47 Uhr, schrieb Adam GROSZER <agroszer at gmail.com>:

>> Hello Jim,
>>
>> Tuesday, May 11, 2010, 1:37:19 PM, you wrote:
>>
>> JF> On Tue, May 11, 2010 at 7:13 AM, Adam GROSZER <agroszer at gmail.com>  
>> wrote:
>>>> Hello Jim,
>>>>
>>>> Tuesday, May 11, 2010, 12:33:04 PM, you wrote:
>>>>
>>>> JF> On Tue, May 11, 2010 at 3:16 AM, Adam GROSZER <agroszer at gmail.com>  
>>>> wrote:
>>>>>> Hello Jim,
>>>>>>
>>>>>> Monday, May 10, 2010, 1:27:00 PM, you wrote:
>>>>>>
>>>>>> JF> On Sun, May 9, 2010 at 4:59 PM, Roel Bruggink  
>>>>>> <roel at fourdigits.nl> wrote:
>>>>>>>> That's really interesting! Did you notice any issues performance  
>>>>>>>> wise, or
>>>>>>>> didn't you check that yet?
>>>>>>
>>>>>> JF> I didn't check performance. I just iterated over a file storage  
>>>>>> file,
>>>>>> JF> checking compressed and uncompressed pickle sizes.
>>>>>>
>>>>>> I'd say some checksum is then also needed to detect bit failures that
>>>>>> mess up the compressed data.
>>>>
>>>> JF> Why?
>>>>
>>>> I think the gzip algo compresses to a bit-stream, where even one bit
>>>> has an error the rest of the uncompressed data might be a total mess.
>>>> If that one bit is relatively early in the stream it's fatal.
>>>> Salvaging the data is not a joy either.
>>>> I know at this level we should expect that the OS and any underlying
>>>> infrastructure should provide error-free data or fail.
>>>> Tho I've seen some magic situations where the file copied without
>>>> error through a network, but at the end CRC check failed on it :-O
>>
>> JF> How would a checksum help?  All it would do is tell you your hosed.
>> JF> It wouldn't make you any less hosed.
>>
>> Yes, but I would know why it's hosed.
>> Not like I'm expecting 2+2=4 and get 5 somewhere deep in the custom
>> app that does some calculation.

N> You could have bitflips anywhere in the database, not just the payload
N> parts. You'd have to checksum and test everything all the time. Imo it's
N> not worth the complexity and performance penalty given today's redundant
N> storages like RAID, ZRS or zeoraid.

N> Btw, the current pickle payload format is not secured against any bitflips
N> either I think.

The difference between the uncompressed and compressed is that if you
have bitflips in an uncompressed data stream then you get let's say a
B instead of A, or 3 instead of 1. That hits hard in numbers/IDs, but
keeps string still human readable. Because the rest is still there.
In a compressed stream the rest of the pickle/payload would be
probably crap.
Probably that crappy data would make the unpickler fail... or wait a
second... the unpickler is a **SECURITY HOLE** in python, isn't it?
That means feed it some random data... and stay tuned for the
unexpected.
The thing is that a single bitflip could cause a LOT of crap.

You're right that currently there's no protection against such
bitflips, but I'd rather present the user a nice error than some
crappy data.

-- 
Best regards,
 Adam GROSZER                            mailto:agroszer at gmail.com
--
Quote of the day:
If necessity is the mother of invention, discontent is the father of progress. 
- David Rockefeller 



More information about the ZODB-Dev mailing list