[ZODB-Dev] Problems with Transactions and FileStorage size

Casey Duncan casey@zope.com
Fri, 19 Jul 2002 08:51:56 -0400


You could use a packless storage, but a better solution would be to not u=
se a=20
simple list for this case. Anytime the list changes the teeniest bit, the=
=20
whole thing must be updated, leading to large transactions and db bloat.=20
Packless storage won't fix the former which is a major performance issue=20
(along with write conflicts if its multi-threaded).

A similar problems happens with Zope ObjectManagers (aka Foldoids) becaus=
e=20
they store a list of object ids and types. When you have a big folder,=20
changing objects in it becomes much more expensive and can lead to huge=20
transactions.

A BTree is a much more efficient data structure that is built to avoid th=
is=20
issue. It is also built to handle write conflicts. However a BTree is mor=
e=20
like a dictionary than a list. One could imagine a BTree (an IOBTree to b=
e=20
precise) where the keys are simply sequencial integer indexes and the val=
ues=20
are the elements or maybe your application could just be refactored aroun=
d=20
using the BTree interface.

One could image a straightforward class using BTrees that has an interfac=
e=20
identical to a list. It is likely such a thing already exists, although I=
'm=20
not aware of one. Managing the keys on deletion would be the trickiest bi=
t to=20
deal with efficiently, but I'm sure it could be done. You could store the=
=20
length using BTrees.Length.Length.

Another option would be to create a linked list structure where each elem=
ent=20
has a reference to the next one in the "list". If random access was not a=
 big=20
concern, this would be a simpler solution and would still eliminate the b=
loat=20
problem. Of course each element would need to be a persistent class insta=
nce.

hth,

-Casey

On Friday 19 July 2002 08:12 am, Heiko Hees wrote:
> Hi,
>=20
> i am looking for a switch, to prevent logging of transactions, since=20
> this seems to heavily grow file size.
>=20
> if i run the following program (first run generates an object with an=20
> array, second run changes the array an commits a 1000 times) file size
> grows as follows:
>=20
> heiko@julie:~/tests/persistent$ ls -al a*
> -rw-r--r--    1 heiko    heiko        3158 Jul 19 14:08 a
> -rw-r--r--    1 heiko    heiko           3 Jul 19 14:08 a.lock
> -rw-r--r--    1 heiko    heiko        2966 Jul 19 14:08 a.tmp
> heiko@julie:~/tests/persistent$ ./dbsizeTest.py a
> heiko@julie:~/tests/persistent$ ls -al a*
> -rw-r--r--    1 heiko    heiko     2856903 Jul 19 14:08 a
> -rw-r--r--    1 heiko    heiko           3 Jul 19 14:08 a.lock
> -rw-r--r--    1 heiko    heiko        2823 Jul 19 14:08 a.tmp
>=20
> does anyone have a hint other than running db.pack()?
>=20
> heiko
>=20
> the program:
>=20
> #!/usr/bin/python
> import ZODB, sys,time
> from Persistence import Persistent
> from ZODB import FileStorage, DB
>=20
> class X(Persistent):
>      def __init__(self):
>          self.a =3D []
>          for i in range(1000):
>              self.a.append(i)
>          self._p_changed =3D 1
>=20
>      def change(self):
>          self.a[0] +=3D1
>          self._p_changed =3D 1
>=20
>=20
>=20
> db =3D DB( FileStorage.FileStorage(sys.argv[1]) )
> connection =3D db.open()
> root =3D connection.root()
>=20
>=20
> if not root.has_key('x'):
>      # first run
>      root['x'] =3D X()
>      get_transaction().commit()
> else:
>      # second run
>      for i in range(1000):
>          root['x'].change()
>          get_transaction().commit()
>=20
> connection.close()
>=20
>=20
> --=20
> brainbot technologies ag
> schwalbacherstr. 74   65183 wiesbaden . germany
> vox +49 611 238505-0  fax ++49 611 238505-1
> http://brainbot.com/  mailto:heiko@brainbot.com
>=20
>=20
>=20
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>=20
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev
>=20