[ZODB-Dev] polite advice request

Mon Aug 19 18:58:10 CEST 2013

Issue resolved, see at end

On 19.08.13 00:39, Alan Runyan wrote:
>>> Would you implement a column store, and how would you do that?
>> Ditto.
> So many Dittos, it  sounds like a Rush Limbaugh talk show :)
>
>> "large" can mean many things. The examples you give don't
>> seem very large in terms of storage, at least not for ZODB.
> One app we have is 26,344,368 objects.
> ZODB is the least of its concerns.
>
>> It's really hard to make specific recommendations without
>> knowing more about the problem. (And it's likely that someone
>> wouldn't be able to spend the time necessary to learn more
>> about the problem without a stake in it. IOW, don't assume I'll
>> read a much longer post getting into details. :)
> This is fair.  ZODB is intimately tied to the application design so
> it is a bit difficult for someone to qualify what they are doing
> without having to explain the application design.
>
> This sucks from a newbie's point of view but its reality.
>
> I just wrote up some thoughts on ZODB.
> Might be useful for others - doubtful - but maybe.
>
> https://docs.google.com/document/d/12RGOTSMrl0CttkCZJ5rp-TSaakAY2Pn4VnWhVMcFMQw/edit?usp=sharing
>
> Anyway.  Tismer if you write up more thoughts; I will read them.
>

Hey, nice write-up, thanks a lot!

On 19.08.13 09:33, Dylan Jay wrote:
> In some ways the ZODB is less flexible. It requires you to understand more about how you will access the data before you import it, than does an SQL database. This is because the datastructure defines how you can query it in a ZODB.
> For example, if you need multiple indexes to your data, then to make it efficient you might choose a different data structure. Whereas in SQL you can add indexes after the fact. Which ever way you go however, you are always better off thinking about how you will access your data first. for example when you reimport the data do you need to do a look up on each item to see if it's there and merge, or will you just delete the lot and start from scratch?
>
> Having said this, you might look at a project like souper that tries to support tabular type data without having to think too much about the data structures.

I looked a bit into souper, maybe I'll try.

Right now I'm happy with this very dumb brute-force solution:

I turned all the 25 tables into a column-store, very simple implementation
with no keys, nothing.
I just took the original table data, sorted it by primary key, and then
built a persistent list for each column.

This unoptimized solution has very little overhead. The primary key can be
searched by bisect, which is right now all we need.

I used ZlibStorage, and the stunning effect:

The database is now 44.5 MB, it loads the few columns that we need
in a fraction of a second, and the original serialization format
took 44.4 MB as a ZIP file. :-D

So the former bloat of almost a GB is gone, versions are cheap, and I don't
try to do further reduction of size or calculate deltas between versions,
but happily use the small, absolute column store databases
which I calculate every two weeks, together with an index database.

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/