[ZODB-Dev] Data.fs size grows non-stop

Jim Fulton jim at zope.com
Wed Dec 9 10:13:29 EST 2009


On Wed, Dec 9, 2009 at 8:42 AM, Pedro Ferreira
<jose.pedro.ferreira at cern.ch> wrote:
...
> We've modified
> Jim's script

Cool. Storage iterators are so simple and allow a wide variety of analyses.

> in order to find out which OIDs are being rewritten, and
> how much space they are taking, and this is a fragment of it:
>
> OID class_name total_size percent_size n_pickles min_size avg_size max_size
> '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683
> 1977885 2004241 2026518
> '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683
> 1616904 1635889 1651956
> '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522
> 20% 28513 418230 419315 420294
> '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238
> 307112 314379 320647
> '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238
> 190816 195216 199007
> '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3%
> 1953 880615 884903 887285
> [...]
>
> As you can see, we have an OOBucket occupying more than 2MB (!) per
> write. That's almost 17GB only considering the last 1M transactions of
> the DB (we get ~3M transactions per week). We believe this bucket
> belongs to some OOBTree-based index that we are using, whose values are
> Python lists (maybe that was a bad choice to start with?). In any case,
> how do OOBuckets work?

Buckets themselves are essentially just sorted lists of key-value pairs.

> Is it a simple key space segmentation strategy,

In the case of BTrees, yes.  I assume your OOBuckets are used within
OOBTrees. (?)

> or are the values taken into account as well?

No.

> Our theory is that an OOBTree simply divides the N keys in K buckets,
> and doesn't care about the contents.

Right.

> So, since we are adding very large
> lists as values, the tree remains unbalanced,

No, they trees tend to stay fairly well balenced wrt keys.

> and since new contents
> will be added to this last bucket,

Why would the contents only be added to one bucket?


> each rewrite will imply the addition
> of ~2MB to the file storage.

That's definately a problem.

> Will the replacement of these lists with a persistent structure such as
> a PersistentList solve the issue?

It might help, but if the lists are very large, you'll still have a problem
because a persistent list is still stored in one database record.

Jim

-- 
Jim Fulton


More information about the ZODB-Dev mailing list