[ZODB-Dev] ZODB memory problems (was: processing a Very Largefile)

Tim Peters tim at zope.com
Sun May 22 13:49:30 EDT 2005


[Jeremy Hylton]
> ...
> The ObjectInterning instance is another source of problem, because it's
> a dictionary that has an entry for every object you touch.

Some vital context was missing in this post.  Originally, on c.l.py, DJTB
wasn't using ZODB at all.  In effect, he had about 5000 lists each
containing about 5000 "not small" integers, so Python created about 5000**2
= 25 million integer objects to hold them all, consuming 100s of megabytes
of RAM.  However, due to the semantics of the application, there were only
about 5000 _distinct_ integers.  What became the `ObjectInterning` class
here started as a suggestion to keep a dict of the distinct integers,
effectively "intern"ing them.  That cut the memory use by a factor of
thousands.

This has all gotten generalized and micro-optimized to the point that I
can't follow the code anymore.  Regardless, the same basic trick won't work
with ZODB (or via any other way of storing the data to disk and reading it
up again later):  if we write the same "not small" integer object out
1000000 times, then read them all back in, Python will again create 1000000
distinct integer objects to hold them.  Object identity doesn't survive for
"second class" persistent objects, and interning needs to be applied again
_every_ time one is created.

[DJTB]
> ... The only thing I can't change is that ExtendedTuple inherits
> from tuple

Let me suggest that you may be jumping in at the deep ends of too many pools
at once here.

> class ExtendedTuple(tuple):
>
>    def __init__(self, els):
>        tuple.__init__(self,els)

That line doesn't accomplish anything:  tuples are immutable, and by the
time __init__ is called the tuple contents are already set forever.  You
should probably be overriding tuple.__new__ instead.

> ...
>    def __hash__(self):
>        return hash(tuple(self))

This method isn't needed.  If you leave it out, the base class
tuple.__hash__ will get called directly, and will compute the same result.



More information about the ZODB-Dev mailing list