[Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

Chris McDonough chrism@digicool.com
Tue, 26 Jun 2001 14:34:29 -0400


Yikes.  I wonder if this overhead comes from Vocabulary updates... thanks
very much for doing this test.

Clearly we need to pin it down.  This is very disappointing.  :-(  Any
further info you dig up is appreciated.

You didn't have any metadata stuff set up, did you?  I imagine even if you
did, that they couldn't possibly account for 200K worth of extra stuff.

- C

----- Original Message -----
From: "abel deuring" <adeuring@gmx.net>
To: "Giovanni Maruzzelli" <maruzz@open4.it>
Cc: "Chris McDonough" <chrism@digicool.com>; <zope-dev@zope.org>;
<erik@thingamy.net>; <barry@digicool.com>;
<tdickenson@geminidataloggers.com>; <tsarna@endicor.com>
Sent: Tuesday, June 26, 2001 2:40 PM
Subject: Re: [Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a
solution?)


> Hi all,
>
> Giovanni Maruzzelli wrote:
> >
> > We think that Abel is absolutely right:
> >
> > if in the same almost empty folder we add and catalog an object with one
> > word (and now we have optimized and reduced the number of indexes to 11)
it
> > make a transaction of 73K, while if the object contains 300 words with
the
> > same other indexes or properties, the transaction is 224K, and if all is
the
> > same but the object contains 535 words the transaction is 331K.
> >
> > And we are using now a catalog with only some 3000 document indexed with
a
> > medium lenght of each document around 1K.
>
> Well, Chris certainly knows more about the internals of ZCatalog than I
> do, so we should not ignore his comments to my mail :)
>
> Chris McDonough wrote:
>
> > > If you now add a new document containing 5 of these frequent words, 5
> > > larger BTrees will be updated. [Chris, let me know, if I'm now going
to
> > > tell nonsense...] I assume that the entire updated BTrees = 120000
bytes
> > > will be appended to the ZODB (ignoring the less frequent words) --
even
> > > if the document contains only 1 kB text.
> >
> > Nah... I don't think so.  At least I hope not!  Each bucket in a BTree
> > is a separate persistent object.  So only the sum of the data in the
> > updated buckets will be appended to the ZODB.  So if you add an item to
> > a BTree, you don't add 24000+ bytes for each update.  You just add the
> > amount of space taken up by the bucket... unfortunately I don't know
> > exactly how much this is, but I'd imagine it's pretty close to the
> > datasize with only a little overhead.
>
> OK, this made me curious, so I made test similar to the one by Giovanni.
> I started with a ZCatalog containing 21616 records; the catalog contains
> only one text index, no keyword index, no field index. I copied one of
> the indexed documents; the text is 2645 bytes long; wc tells me that it
> has 313 words. Next, I packed the data base in order to have a "clean
> start point". After packing, Data.fs has a size of 233661963 byte.
>
> Then I cataloged the new object using my "lazy catalog". Since I have
> only one new document, this is basically the same as using
> CatalogAwareness. After indexing, the data base has grown to 233851090
> bytes -- an increase of 189127 bytes. Then I packed the data base again,
> resulting in a size of 233666237 bytes.
>
> So the "net increase" is indeed 233666237-233661963 = 4274 bytes, as you
> expected, but obviously a few more data base records need to be updated.
>
> Abel
>
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )
>