[ZODB-Dev] Ackward PersistentList read Performance

Tue Aug 13 17:33:52 CEST 2013

On Tue, Aug 13, 2013 at 9:40 AM, Joerg Baach <lists at baach.de> wrote:
> Hi *,
>
> I was trying to measure the impact of using different kind of objects to
> store data in ZODB (disk, ram, time).
>
> Whats really ackward is the measurement for reading content from
> PersistentLists (that are stored in an IOBTree):
>
> case a
> ======
> g.edges=IOBTree()
> for j in range(1,1000000):
>     edge =PersistentList([j,1,2,{}])
>     g.edges[j] = edge
>
> x = list(g.edges.values())
> y = [e[3] for e in x]   #this takes 30 seconds
>
> case b
> ======
> g.edges=IOBTree()
> for j in range(1,1000000):
>     edge =[j,1,2,{}]
>     g.edges[j] = edge
>
> x = list(g.edges.values())
> y = [e[3] for e in x]   #this takes 0.09 seconds
>
> So, can it really be that using a PersistentList is 300 times slower?

Yes.  This would be true of *any* persistent object. In the first
case, You're creating 1000000+B database objects, were B is ~20.
In the second case, you're creating B persistent objects.

Depending on what you do between cases A and B, you may also
have to load 1000000+B vs B objects.

> Am
> I doing something completely wrong,

It depends on your application.  Generally, one uses a BTree to avoid
loading a large collection into memory.  Iterating over the whole
thing defeats that.

Deciding whether to use a few large database objects or many small
ones is a tradeoff between efficiency of access and efficiency of
update, depending on access patterns.

> or am I missing something?

Possibly

> I am using ZODB3-3.10.5. The whole setup (incl. results) is at
> https://github.com/jhb/zodbtime

tl;dr

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton