Hi All,<div><br></div><div>I converted my code to use LOBTrees holding LLTreeSets and it sticks to the memory bounds and performs admirably throughout the whole process.  Unfortunately opening the database afterwards seems to be really really slow.  Here&#39;s what I&#39;m doing:</div>


<div><br></div><div>from ZODB.FileStorage import FileStorage</div><div><div>from ZODB.DB import DB</div><div><br></div><div>storage = FileStorage(&#39;attempt3_wordid_to_docset&#39;,pack_keep_old=False)</div><div><br></div>


<div>I think the file in question is about 7 GB in size.  It&#39;s using 100 percent of a core and I&#39;ve never seen it get past the FileStorage object creation.  Is there something I&#39;m doing wrong when I initially fill this storage that makes it so hard to index, or is there something wrong with the way I&#39;m creating the new FileStorage?</div>


<div><br></div><div>Thanks for everything, you guys have really been great.</div><div><br></div><div>-Ryan</div><div><br></div><div><br></div><br><div class="gmail_quote">On Wed, May 12, 2010 at 3:48 AM, Jim Fulton <span dir="ltr">&lt;<a href="mailto:jim@zope.com">jim@zope.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On Tue, May 11, 2010 at 7:37 PM, Ryan Noon &lt;<a href="mailto:rmnoon@gmail.com">rmnoon@gmail.com</a>&gt; wrote:<br>


&gt; Hi Jim,<br>

&gt; I&#39;m really sorry for the miscommunication, I thought I made that clear in my<br>

&gt; last email:<br>

&gt; &quot;I&#39;m wrapping ZODB in a &#39;ZMap&#39; class that just forwards all the dictionary<br>

&gt; methods to the ZODB root and allows easy interchangeability with my old<br>

&gt; sqlite OODB abstraction.&quot;<br>

<br>

</div>Perhaps I should have picked up on this, but it wasn&#39;t clear that you<br>

were refering to word_id_docset. I couldn&#39;t see that in the code and I<br>

didn&#39;t get an answer to my question.<br>

<div class="im"><br>

&gt; wordid_to_docset is a &quot;ZMap&quot;, which just wraps the ZODB<br>

&gt; boilerplate/connection and forwards dictionary methods to the root.<br>

<br>

</div>This is the last piece to the puzzle.  The root object is a persistent<br>

mapping object that is a single database object and is thus not a<br>

scalable data structure.  As Lawrence pointed out, this, together with<br>

the fact that you&#39;re using non-persistent arrays as mapping values<br>

means that all your data is in a single object.<br>

<div class="im"><br>

&gt; but I&#39;m still sorta worried because in my experimentation with ZODB<br>

&gt; so far I&#39;ve never been able to observe it sticking to any cache limits, no<br>

&gt; matter how often I tell it to garbage collect (even when storing very small<br>

&gt; values that should give it adequate granularity...see my experiment at the<br>

&gt; end of my last email).<br>

<br>

</div>The unit of granularity is the persistent object.  It is persitent<br>

object that are managed by the cache, not indivdual Python objects<br>

like strings.  If your entire database is in a single persistent<br>

object, then you&#39;re entire database will be in memory.<br>

<br>

If you want a scallable mapping and your keys are stabley ordered (as<br>

are strings and numbers) then you should use a BTree.  BTrees spread<br>

there data over multiple data records, so you can have massive<br>

mappings without storing massive amounts of data in memory.<br>

<br>

If you want a set and the items are stabley ordered, then a TreeSet<br>

(or a Set if the set is known to be small.)<br>

<br>

There are build-in BTrees and sets that support compact storage of<br>

signed 32-bit or 64-bit ints.<br>

<br>

Jim<br>

<br>

--<br>

<font color="#888888">Jim Fulton<br>

</font></blockquote></div><br><br clear="all"><br>-- <br>Ryan Noon<br>Stanford Computer Science<br>BS &#39;09, MS &#39;10<br>

</div>