[ZODB-Dev] Data.fs size grows non-stop

Marius Gedminas marius at gedmin.as
Wed Dec 9 11:09:36 EST 2009


On Wed, Dec 09, 2009 at 02:42:58PM +0100, Pedro Ferreira wrote:
> Hello,
> > Just zodbbrowser with no prefix:
> >
> >   http://pypi.python.org/pypi/zodbbrowser
> >   https://launchpad.net/zodbbrowser
> >
> > It's a web-app: it can connect to your ZEO server so you can inspect the
> > DB while it's being used.
> >   
> We tried this, but we currently get an error related with the general 
> security policy for zope.app. Maybe we need to install Zope?

No, it should be self-contained.

I'm usually testing with the Zope 3.4 KGS.  Easiest way to do that is to
check out the sources and use buildout:

  bzr get lp:zodbbrowser
  cd zodbbrowser
  python bootstrap.py
  bin/buildout
  bin/zodbbrowser --zeo localhost:1234 

I'll test it with the latest zope.* packages from PyPI and see if I can
reproduce the error.  Feel free to report a bug (and attach the
traceback you get) here: https://bugs.launchpad.net/zodbbrowser

> This would be a very handy tool.
> > I'd suggest dumping the last few transactions with one of the ZODB
> > scripts (fsdump.py perhaps) and seeing what objects get modified.
> >   
> That's what we've being doing, and we got some clues. We've modified 
> Jim's script in order to find out which OIDs are being rewritten, and 
> how much space they are taking, and this is a fragment of it:
> 
> OID class_name total_size percent_size n_pickles min_size avg_size max_size
> '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683 
> 1977885 2004241 2026518
> '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683 
> 1616904 1635889 1651956
> '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522 
> 20% 28513 418230 419315 420294
> '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238 
> 307112 314379 320647
> '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238 
> 190816 195216 199007
> '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3% 
> 1953 880615 884903 887285
> [...]
> 
> As you can see, we have an OOBucket occupying more than 2MB (!) per 
> write.

Ouch.

> That's almost 17GB only considering the last 1M transactions of 
> the DB (we get ~3M transactions per week). We believe this bucket 
> belongs to some OOBTree-based index that we are using, whose values are 
> Python lists (maybe that was a bad choice to start with?). In any case, 
> how do OOBuckets work? Is it a simple key space segmentation strategy, 
> or are the values taken into account as well?
> Our theory is that an OOBTree simply divides the N keys in K buckets, 
> and doesn't care about the contents. So, since we are adding very large 
> lists as values, the tree remains unbalanced, and since new contents 
> will be added to this last bucket, each rewrite will imply the addition 
> of ~2MB to the file storage.
> Will the replacement of these lists with a persistent structure such as 
> a PersistentList solve the issue?

It should definitely help.  Now, if modify one of the lists in a bucket,
it needs to append the whole 2 megs of data to the Data.fs.  If you'd
used a PersistentList, it would only need to append as much data as your
persistent list contains.  Assuming there are ~30 lists in a bucket,
you'd get 30x space savings.

Are those lists long?  Are they modified often?  What do they contain?
I'm sure you can get better space efficiency by redesigning the data
structure.

Marius Gedminas
-- 
Dijkstra probably hates me
		-- Linus Torvalds, in kernel/sched.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mail.zope.org/pipermail/zodb-dev/attachments/20091209/26d84591/attachment.bin 


More information about the ZODB-Dev mailing list