[ZODB-Dev] sticky objects

Tim Peters tim at zope.com
Sun Jan 30 18:27:39 EST 2005


[Simon Burton]
...
> ZODB 3.3.0final Python 2.3.4 (whoops, was python2.3.3)
> Mandrake Linux, kernel 2.6.3
> glibc-2.3.3
>
> I'm using top to measure memory usage. It's pretty good at indicating
> when my machine is freaking out when a process is taking all the memory.

As Jeremy said, I think he's right that top shows VM high water mark.  I
can't tell you whether that's what you care about, but it's something to be
aware of.

> ...
> Yes, I see. I tried a few things for x. A list of ints (taking a copy for
> each PItem) has the same behavior . In the real app x is a dict.

You shouldn't expect a large PersistentList containing tens of thousands of
dicts to have nice behavior.  Again as Jeremy said, you could look to
BTree-based data structures for scalable behavior.

> ...
>> In my run, it's obvious that neither ZODB nor Python held on to the
>> memory. If your platform C free() doesn't "give the memory back to the
>> operating system" here, I don't know that there's anything you can do
>> about that short of using a different C or libc.

> Well, that's good. It sounds like i'll be instrumenting malloc/free to
> find the problem here.

I wouldn't -- I'd change the test program to act more like your actual
application in relevant respects first.  If you dig into malloc/free with
what you showed, you'll be tracking down behavior specific to allocating
tens of thousands of moderately large strings.  There's no good a priori
reason to expect that to be relevant to low-level details affecting what
happens if you use tens of thousands of dicts instead.  For example, Python
doesn't use the platform malloc() directly to allocate space for dict
objects, it uses Python's "small object allocator" instead, and the latter
indeed never gives memory back to the platform C free() (before Python
shutdown).  OTOH, dicts are more complicated than strings, and that was only
part of the full allocation story for dicts.  Details get complicated.

It would probably be more fruitful to think about how to structure your data
and algorithms so that they never have to materialize hundreds of megabytes
of data at once.  IOW, wrestle with avoiding large high water marks to begin
with, instead of trying to trick low-level platform systems into dealing
with extreme high water marks more gracefully.



More information about the ZODB-Dev mailing list