On Fri, Jan 18, 2013 at 9:02 AM, Marius Gedminas <span dir="ltr"><<a href="mailto:marius@gedmin.as" target="_blank">marius@gedmin.as</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote:<br>
> I wrote the following code to preload the indices:<br>
><br>
> def preload_index_btree(index_name, index_type, btree):<br>
> print "((Preloading '%s' %s index btree...))" % (index_name,<br>
> index_type)<br>
> start = last_print = time.time()<br>
> for i, item in enumerate(btree.items()):<br>
> item<br>
<br>
</div>That's a no-op: you might as well just write 'pass' here.<br></blockquote><div><br></div><div>True, I wanted to do something with 'item' but didn't know what.</div><div><br></div><div><br></div>
<div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"><br>> print "((Preloaded '%s' %s index btree (%d items in %.2fs)))" % (<br>
> index_name, index_type, i, time.time() - start,<br>> )<br><br></div>If you ever get an empty btree, you'll get an UnboundLocalError: 'i' here.<br><br>Drop the enumerate() trick and just use len(btree), it's efficient.<br>
</blockquote><div><br>Thanks for catching that. `len` still takes a while on a large btree though if it isn't in memory:</div><div><br></div><div><div><font size="1" face="courier new, monospace"> In [7]: start = time.time(); len(bt); end = time.time()</font></div>
<div><font size="1" face="courier new, monospace"> Out[7]: 350169</font></div><div><font size="1" face="courier new, monospace"> In [8]: end - start</font></div><div><font size="1" face="courier new, monospace"> Out[8]: 32.397267818450928</font></div>
</div><div><br></div><div>It actually seems to require loading the entire tree, because after running `len`, subsequent operations (like iterating through the entire tree) start happening instantly. However, since I just iterated through the entire tree, it will definitely be fast at that point.</div>
</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
If you want to load the btree item into cache, you need to do<br>
<br>
item._p_activate()<br></blockquote><div><br></div><div>That's not going to work, since `item` is a tuple. I don't want to load the item itself into the cache, I just want the btree to be in the cache. I figured iterating through the entire tree would force it to be loaded, but is that not the case? If not then what should I call `_p_activate()` on? I assume calling it on the tree itself won't cause all its internals to be loaded. I'm not familiar with the internals of the BTree, however. Would this be a better solution?</div>
<div><br></div><div><div><font size="1" face="courier new, monospace"> def preload_index_btree(index_name, index_type, btree):</font></div><div><font size="1" face="courier new, monospace"> print "((Preloading '%s' %s index btree...))" % (index_name, index_type)</font></div>
<div><font size="1" face="courier new, monospace"> start = time.time()</font></div><div><font size="1" face="courier new, monospace"> num_buckets = 0</font></div><div><font size="1" face="courier new, monospace"> bucket = btree._firstbucket</font></div>
<div><font size="1" face="courier new, monospace"> while bucket:</font></div><div><span style="font-family:'courier new',monospace;font-size:x-small"> bucket._p_activate()</span></div><div><font size="1" face="courier new, monospace"> num_buckets += 1</font></div>
<div><font size="1" face="courier new, monospace"> bucket = bucket._next</font></div><div><font size="1" face="courier new, monospace"> print "((Preloaded '%s' %s index btree (%d/%d buckets items in %.2fs)))" % (</font></div>
<div><font size="1" face="courier new, monospace"> index_name, index_type, len(btree), num_buckets, time.time() - start,</font></div><div><font size="1" face="courier new, monospace"> )</font></div></div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
> def preload_catalog(catalog):<br>
> """Given a catalog, touch every persistent object we can find to<br>
> force<br>
> them to go into the cache."""<br>
> start = time.time()<br>
> num_indices = len(catalog.items())<br>
> for i, (index_name, index) in enumerate(catalog.items()):<br>
> print "((Preloading index %2d/%2d '%s'...))" % (i+1,<br>
> num_indices, index_name,)<br>
> preload_index_btree(index_name, 'fwd', index._fwd_index)<br>
> preload_index_btree(index_name, 'rev', index._rev_index)<br>
> print "((Preloaded catalog! Took %.2fs))" % (time.time() - start)<br>
><br>
> And I run it on server start as follows (modified for the relevant parts; I<br>
> tried to make the example simple but it ended up needing a lot of parts).<br>
> This runs in a thread:<br>
><br>
> from util import zodb as Z<br>
> from util import zodb_query as ZQ<br>
> for i in xrange(3):<br>
> connwrap = Z.ConnWrapper('index')<br>
> print "((Preload #%d...))" % (i+1)<br>
> with connwrap as index_root:<br>
> ZQ.preload_catalog(index_root.index.catalog)<br>
> connwrap.close()<br>
<br>
</div>Every thread has its own in-memory ZODB object cache, but if you have<br>
configured a persistent ZEO client cache, it should help.<br></blockquote><div><br></div><div>Gotcha. Thanks for the help!</div><div>- Claudiu</div></div>