On Fri, Jan 18, 2013 at 9:02 AM, Marius Gedminas <span dir="ltr">&lt;<a href="mailto:marius@gedmin.as" target="_blank">marius@gedmin.as</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Thu, Jan 17, 2013 at 12:31:52PM -0500, Claudiu Saftoiu wrote:<br>

&gt; I wrote the following code to preload the indices:<br>

&gt;<br>

&gt;     def preload_index_btree(index_name, index_type, btree):<br>

&gt;         print &quot;((Preloading &#39;%s&#39; %s index btree...))&quot; % (index_name,<br>

&gt; index_type)<br>

&gt;         start = last_print = time.time()<br>

&gt;         for i, item in enumerate(btree.items()):<br>

&gt;             item<br>

<br>

</div>That&#39;s a no-op: you might as well just write &#39;pass&#39; here.<br></blockquote><div><br></div><div>True, I wanted to do something with &#39;item&#39; but didn&#39;t know what.</div><div><br></div><div><br></div>

<div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"><br>&gt;         print &quot;((Preloaded &#39;%s&#39; %s index btree (%d items in %.2fs)))&quot; % (<br>

&gt;             index_name, index_type, i, time.time() - start,<br>&gt;         )<br><br></div>If you ever get an empty btree, you&#39;ll get an UnboundLocalError: &#39;i&#39; here.<br><br>Drop the enumerate() trick and just use len(btree), it&#39;s efficient.<br>

</blockquote><div><br>Thanks for catching that. `len` still takes a while on a large btree though if it isn&#39;t in memory:</div><div><br></div><div><div><font size="1" face="courier new, monospace">    In [7]: start = time.time(); len(bt); end = time.time()</font></div>

<div><font size="1" face="courier new, monospace">    Out[7]: 350169</font></div><div><font size="1" face="courier new, monospace">    In [8]: end - start</font></div><div><font size="1" face="courier new, monospace">    Out[8]: 32.397267818450928</font></div>

</div><div><br></div><div>It actually seems to require loading the entire tree, because after running `len`, subsequent operations (like iterating through the entire tree) start happening instantly. However, since I just iterated through the entire tree, it will definitely be fast at that point.</div>

</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

If you want to load the btree item into cache, you need to do<br>

<br>

              item._p_activate()<br></blockquote><div><br></div><div>That&#39;s not going to work, since `item` is a tuple. I don&#39;t want to load the item itself into the cache, I just want the btree to be in the cache. I figured iterating through the entire tree would force it to be loaded, but is that not the case? If not then what should I call `_p_activate()` on? I assume calling it on the tree itself won&#39;t cause all its internals to be loaded. I&#39;m not familiar with the internals of the BTree, however. Would this be a better solution?</div>

<div><br></div><div><div><font size="1" face="courier new, monospace">    def preload_index_btree(index_name, index_type, btree):</font></div><div><font size="1" face="courier new, monospace">        print &quot;((Preloading &#39;%s&#39; %s index btree...))&quot; % (index_name, index_type)</font></div>

<div><font size="1" face="courier new, monospace">        start = time.time()</font></div><div><font size="1" face="courier new, monospace">        num_buckets = 0</font></div><div><font size="1" face="courier new, monospace">        bucket = btree._firstbucket</font></div>

<div><font size="1" face="courier new, monospace">        while bucket:</font></div><div><span style="font-family:&#39;courier new&#39;,monospace;font-size:x-small">            bucket._p_activate()</span></div><div><font size="1" face="courier new, monospace">            num_buckets += 1</font></div>

<div><font size="1" face="courier new, monospace">            bucket = bucket._next</font></div><div><font size="1" face="courier new, monospace">        print &quot;((Preloaded &#39;%s&#39; %s index btree (%d/%d buckets items in %.2fs)))&quot; % (</font></div>

<div><font size="1" face="courier new, monospace">            index_name, index_type, len(btree), num_buckets, time.time() - start,</font></div><div><font size="1" face="courier new, monospace">        )</font></div></div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

&gt;     def preload_catalog(catalog):<br>

&gt;         &quot;&quot;&quot;Given a catalog, touch every persistent object we can find to<br>

&gt; force<br>

&gt;         them to go into the cache.&quot;&quot;&quot;<br>

&gt;         start = time.time()<br>

&gt;         num_indices = len(catalog.items())<br>

&gt;         for i, (index_name, index) in enumerate(catalog.items()):<br>

&gt;             print &quot;((Preloading index %2d/%2d &#39;%s&#39;...))&quot; % (i+1,<br>

&gt; num_indices, index_name,)<br>

&gt;             preload_index_btree(index_name, &#39;fwd&#39;, index._fwd_index)<br>

&gt;             preload_index_btree(index_name, &#39;rev&#39;, index._rev_index)<br>

&gt;         print &quot;((Preloaded catalog! Took %.2fs))&quot; % (time.time() - start)<br>

&gt;<br>

&gt; And I run it on server start as follows (modified for the relevant parts; I<br>

&gt; tried to make the example simple but it ended up needing a lot of parts).<br>

&gt; This runs in a thread:<br>

&gt;<br>

&gt;     from util import zodb as Z<br>

&gt;     from util import zodb_query as ZQ<br>

&gt;     for i in xrange(3):<br>

&gt;         connwrap = Z.ConnWrapper(&#39;index&#39;)<br>

&gt;         print &quot;((Preload #%d...))&quot; % (i+1)<br>

&gt;         with connwrap as index_root:<br>

&gt;             ZQ.preload_catalog(index_root.index.catalog)<br>

&gt;         connwrap.close()<br>

<br>

</div>Every thread has its own in-memory ZODB object cache, but if you have<br>

configured a persistent ZEO client cache, it should help.<br></blockquote><div><br></div><div>Gotcha. Thanks for the help!</div><div>- Claudiu</div></div>