First off, thanks everybody. I'm implementing and testing the suggestions now. When I said ZODB was more complicated than my solution I meant that the system was abstracting a lot more from me than my old code (because I wrote it and new exactly how to make the cache enforce its limits!).<br>
<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">> The first thing to understand is that options like cache-size and<br>
> cache-size bytes are suggestions, not limits. :) In particular, they<br>
> are only enforced:<br>
><br>
> - at transaction boundaries,<br></div></blockquote><div><br></div><div>If it's already being called at transaction boundaries how come memory usage doesn't go back down to the quota after the commit (which is only every 25k documents?).</div>
<div><br></div><div>With regards to returning memory to the OS, I don't really care if it reports less, but it really seems like it's overallocating if the OS kills it on an 8GB machine with a 512mb quota.</div><div>
</div><div>Tres:</div><div><br></div><div>With your first point:</div><div><br></div><div>Yeah, I wrote that late last night and I just realized it's getting evaluated stupidly on the setdefault call. I was trying to be cute with Python dict methods that I hadn't used before. Stupid me.</div>
<div><br></div><div>With regards to your second point:</div><div><br></div><div>I read the loop optimization wiki page over at <a href="http://python.org">python.org</a> too many times and I get itchy whenever there's method lookup inside of a loop. I need to remember I'm dealing with a database here and IO is gonna be the bottleneck anyway.</div>
<div><br></div><div>With regards to your third point:</div><div><br></div><div>I actually ran into the same change notification problem when I was rolling my own OODB and I assumed ZODB had done something tricky because my changes were showing up upon reopening the db even when I'd done the append and not told ZODB about the change. I'll fix it to make that more explicit...I think the "magical" effects I'd seen were related to my problem with too much damn caching. The "array" type is from the stdlib array module that I'm just appending my IDs to as longs. I figured it'd be more compact and would serialize faster.</div>
<div><br></div></div><div><br></div>Btw, the final commit is outside the loop. (not shown). =)<br><br><div>Cheers,<br>Ryan<br clear="all"><br>-- <br>Ryan Noon<br>Stanford Computer Science<br>BS '09, MS '10<br>
</div>