[Zope3-Users] (solved?) Large mappings or sequences in ZODB eat all the memory

Christophe Combelles ccomb at free.fr
Wed Nov 14 13:40:11 EST 2007


Christophe Combelles a écrit :
> Hello,
> 
> What should I do to have a data structure which is memory scalable?
> 
> Consider the following large btree:
> 
> $ ./debugzope
> 
>     >>> from BTrees.OOBTree import OOBTree
>     >>> root['btree']=OOBTree()
>     >>> for i in xrange(700000):
>     ...   root['btree'][i] = tuple(range(i,i+30))
>     ...
>     >>> import transaction
>     >>> transaction.commit()
> 
> Quit and restart  ./debugzope
> 
> Now I just want to know if some value is in the btree:
> 
>     >>> 'value' in root['btree'].values()


Ok, the story could be called: "ZODB is great, but take care of what you do with 
persistency". There are 3 solutions to this problem. One ugly, one workaround, 
and the correct one. I found the ugly one; thanks to Dennis and Chris for 
pointing to the workaround and the correct one.

The whole btree is raised to the memory, even when I do a simple loop such as:

     >>> for i in root['btree']:
     ...     pass

(it's the same with items(), iteritems(), values(), itervalues().)


1) First the *ugly* one: I abort the transaction every N loops:

     >>> import transaction
     >>> a=0
     >>> for i in root['b']:
     ...     a+=1
     ...     if not a % 5000:
     ...         transaction.abort()
     ...

That works, but that's definitely not the right thing to do, I suspect that by 
aborting the transaction in the middle of the read, someone else might be able 
to modify the btree before I've finished my read. (zodb experts, please confirm)


2) Now a good *workaround* (that I will eventually use, because it's too late 
for me to change the data structure of my app, and it happens to be the fastest 
solution).
It's almost the same, except that instead of aborting the transaction, we 
periodically minimize the cache of the connection to the ZODB:

     >>> a=0
     >>> for i in root['btree']:
     ...     a+=1
     ...     if not a % 5000:
     ...         root['btree']._p_jar.cacheMinimize()
     ...

This way, the maximum memory used corresponds to 5000 tuples.


3) the *correct* solution is to store real persistent objects in the btree.
(ie objects that derive from persistent.Persistent).
That works , and eats zero memory. But it's slower than tuples.
Non-persistent tuples are persisted because they are part of a persistent 
object, but they are considered an integral part of the btree, and not 
individual separate persistent objects.

That's my understanding, however that does not really explain why looping over 
non-persistent objects in a btree should absolutely raise everything in the memory.

And what about IIBTrees? (integers are not persistent by themselves)


Christophe


> 
> or compute the length
> 
>     >>> len(root['btree'])
> (I'm already using some separate lazy bookkeeping for the length, but 
> even if len() is time consuming for a btree, it should be possible from 
> a memory point of view)
> 
> This loads the whole btree in memory (~500MB), and that memory never 
> gets released! If the btree grows, how will I be able to use it? (>2GB)
> 
> I've tried to scan the btree by using slices, using 
> root['btree'].itervalues(min,max), and by trying to do some 
> transaction.abort()/commit()/savepoint()/anything() between the slices.
> But every slice I parse allocates yet another amount of memory, and when 
> the whole btree has been scanned using slices, it's like the whole btree 
> was in memory.
> 
> I've also tried with lists, the result is the same, except the memory 
> gets eaten even quicker.
> 
> What I understand is that the ZODB wakes up everything, and the memory 
> allocator of python (2.4) never release the memory. Is there a solution 
> or something I missed in the API of the ZODB or BTrees or python itself?
> 
> thanks,
> Christophe
> 
> 
> 
> 
> _______________________________________________
> Zope3-users mailing list
> Zope3-users at zope.org
> http://mail.zope.org/mailman/listinfo/zope3-users
> 
> 



More information about the Zope3-users mailing list