[ZODB-Dev] BTree bug on a 64 bit platform.

Tim Peters tim at zope.com
Mon Nov 22 20:30:35 EST 2004


[John Barratt]
> I got to wondering the otherday how IOBTrees cope with being moved
> between 32 and 64 bit platforms in the ZODB when the int's are different
> sizes.  I wanted to create a BTree on a 64bit platform using 64bit int
> keys...

Note that the word "int" is highly ambiguous here.  There's a C type named
"int", and a Python builtin named "int", and there are integers suitable for
use as keys in IO and II flavors of BTree-based data types.  No two of those
are actually the same concept.  Python "int"s are actually C "long"s, and
Python's "long"s are unbounded integers (which has no counterpart in C).

> This led me to some tests which show some inconsistency between the usage
> of BTree's on the two platforms.  This *could* be a system/compiler
> specific bug, but I only have one type of system to test on (AMD 64, Suse
> 9.0 64bit, Python 2.3.4).
>
> Take the code :
>
> from BTrees.IOBTree import IOBTree
> t = IOBTree()
> k = pow(2, 35) ## Create a > '32bit' int.
> v = 'Value for key %i' % (k,)
> t[k] = v
> print "Input Key=%i\nType='%s'\nValue='%s'."% (k, type(k), v)
> print
> for k in t.keys():
>      print "Output Key=%s\nType='%s'\nValue='%s'." % (k, type(k), t[k])
>
> On a 32bit platform it rightly generates an exception due to the long
> being given as a key to the IOBTree:
>
> Input Key=34359738368
> Type='<type 'long'>'
> Value='Value for key 34359738368'.
> Traceback (most recent call last):
>    ...
> TypeError: expected integer key

That exception occurs because the Python C API's PyInt_Check() function
returns false when fed a Python long.  The BTree code then passes on that
exception.

> On the 64bit system

This is ambiguous too.  It's got solely to do with what the C compiler you
used says "int" and "long" are.  C compilers vary widely in this respect.

> it silently converts the int key to a '32 bit' key :
>
> Input Key=34359738368
> Type='<type 'int'>'
> Value='Value for key 34359738368'.
>
> Output Key=0
> Type='<type 'int'>'
> Value='Value for key 34359738368'.
>
> I haven't looked in the core ZODB code to see if this is a potential
> problem (and it's probably a bit beyond me ATM), but it may be that since
> the keys are generated within the C code it will never be a problem
> there?

It's sloppiness in the BTree implementation, but of a tedious nature that's
never risen to the top of my todo list, and possibly never will <wink>.

The BTree implementation:

- Uses C "int" for IO and II BTree-based data structure key types.

- Appears to *believe* that "int" means "32 bits" on all platforms.

  + This belief is technically unjustified; the only thing C promises
    about "int" size is that it holds at least 16 bits.

  + But this belief is unlikely to be violated in practice; there are
    C compilers for embedded devices where "int" is 16 bits, and
    there are C compilers for old Cray machines where "int" is 64
    bits, but under all other C compilers for 32-bit and 64-bit
    boxes I'm aware of "int" is in fact 32 bits.

- Implicitly assumes that sizeof(int) == sizeof(long).  There's no
  consistency about that across C compilers for 64-bit boxes, but
  you're obviously using a C with sizeof(long) > sizeof(int).

At heart, the BTree code is missing an overflow check, and loses bits due to
an implicit cast from C long to C int.  I don't know which compiler you were
using, but I sure *hope* it generated warnings about the lines where this
occurred.  The BTree C code is intended not to generate compiler warnings,
and a good C compiler on a box where sizeof(long) > sizeof(int) should be
able to help catch mistakes like this.

BTW, the BTree code emphatically does not intend that you be able to "create
a BTree on a 64bit platform using 64bit int keys", if by BTree you meant an
IO or II BTree.  It appears to intend 32-bit integers regardless of
platform.  You should use an OO or OI tree instead if you want integer keys
exceeding 32 bits.

> The only place in the python code I could immediately see the possibility
> of a problem is for incorrect indexing with the DateTime index for dates
> greater than around the year 4050.  These would be indexed as being
> around 'the start of time'.  Catalog rid's are generated from within a
> range that can always be represented in < 32bits so are OK.
>
> Otherwise the main problem area would be in '3rd party' code (like
> mine!).  I also would imagine the same sort of problem would exist with
> other types of structures such as IIBTree etc.

The lack of proper overflow checking is ubiquitous in the integer BTree
code, but possibly confined to expansions of the macros in intkeymacros.h
and intvaluemacros.h.

> Am I missing something here, or is this a bug that could one day really
> bite due to data loss from overwriting key values in BTrees?

I consider it to be a bug in the BTree implementation.

> Can this situation be dealt with 'nicely' other than changing the code to
> raise an exception if on a 64bit platform you try to use a > '32bit' int
> as a key?

Sorry, I don't understand this.  If the bug ever gets fixed, it will get
fixed precisely by raising an exception if you attempt to use an integer
exceeding 32 bits as a key in an II or IO structure, or as a value in an II
or OI structure.

If you don't want restrictions, use OO structures instead.  The "I" versions
are efficiency gimmicks, and buy their efficiencies at the expense of
generality (and, in cases like this one, at the expense of bugs too).

> This is the only way I can see to maintain portability of Data.fs files
> between 32 & 64 bit platforms.

They're already portable -- the bug doesn't affect that.  When you use your
2**35 key, e.g., and store it, then you'll get back 0 regardless of which
kind of box you're on when you load that Data.fs file.  You could create an
unportable Data.fs II, IO, or OI BTree only by running on a box with a C
compiler where 'int' wasn't 32 bits.

> Apologies if this has been covered before, but I could find no
> immediate reference.

Nope, I don't think you missed anything.



More information about the ZODB-Dev mailing list