[ZODB-Dev] Optimizing RelStorage Packing for large DBs

Thu Nov 21 23:57:32 CET 2013

If of interest, its released here
https://pypi.python.org/pypi/relstorage_packer/1.0

Thanks for your support! (even if my english was not always clear enough 
to express my thoughts exact enough)

regards Jens

On 2013-11-16 02:01, Jens W. Klein wrote:
> I started a new packing script for Relstorage (history free,
> postgresql). It is based on incoming reference counting. Why do we need
> this? First a bit story telling (fast forward down to section "new
> script" if you're bored)
>
> The Problem
> -----------
>
> So we had a database with ~298.000.000 objects stored, grown over 4-5
> months. The beast was never packed. We import/sync data on a weekly
> base: A complex Plone based product catalog optimized for web in 27
> languages each ~18.000 products, each with several variants (stored in
> annotated btrees). Lots of stuff is deleted and recreated on import. So
> we had a lot of garbage in DB.
>
> We're using history free Relstorage. Relstorage performs much better on
> concurrent writes and speeds up our import a lot. First on MySQL, now on
> PostgreSQL. Even if PostgreSQL does not solve all problems it seems to
> be the stable choice. At least for one time we had a frozen server. We
> blame MySQL, but well, its a CentOS and there are several other problems
> injecting side-effects (old unpatched kernel, etc, pp, ... we're dev not
> op, its enterprise level and decisions made in past are our constraints,
> carved in stone, more to say?)
>
> Anyway: Our Plone with ZODB Relstorage performs very very well - with
> this many objects stored including all the garbage!
>
> BUT...we want to get rid of the garbage left on every import.
>
> Relstorage deploys with a packaging script. Running it on the ~300
> million objects resulted in RAM-consumption >20GB with 300 days
> estimated time of completion (we did not try if this is true). Phew.
>
> So something had to be done. Storage usage grew about 15-20GB per week.
> As a quick (took some days) fix we decided to try classical Data.fs
> packing (zeopack).
>
> We blocked content managers from editing and stopped imports and cloned
> a snapshot of the virtual maschine with the DB.
> At this point we had roughly 160GB in Postgres.
> On the clone we converted the Relstorage to a Filesystem Storage. This
> took 40 hours.
> Storage shrinked to a 55GB in size Data.fs
> Then we started the classical packing and after some hours the DB was
> shrunken to 7.8 GB Data.fs still containing 44 million objects.
> Next we converted the Data.fs back to Relstorage (~6 hours) and switched
> live over to the packed DB.
>
> Packing on the 44 million object relstorage with the default script
> shall take several days (3-4 days) only for the prepacking phase. It
> consumes less memory (~5GB) but is still insane.
>
> New Script
> ----------
>
> After analyzing the current (generic - mysql, pgsql, oracle - history
> and history free) way to pack the DB I found no good way to optimize the
> code here. Other were there before me.
>
> Trial 1:
>
> So I tried to rethink the way packing is done and tried to follow the
> copy-and-switch method the filestorage uses: Copy all in-use zoids from
> object_state to a new table (traverse the graph starting at zoid 0),
> drop the old object_state, rename the new table to object_state. The
> copy process can be distributed. Several workers can do it in parallel
> until the database is at it max transactions/second.
>
> I had this working. 4-6 workers running, a queue with zoids to process
> in DB, a master controlling the process and so on. But even with this
> approach the total time for a pack of the whole DB is after SQL
> optimizations(!) roughly 8 days for a 44 million object graph.
>
> OK - Back to the drawing board.
>
> Trial 2:
>
> On thursday evening I had a good conservation (and some beer) with a
> buddy from the local linux user group and we discussed the problem. The
> outcome was to try it with reference counting.
>
> The idea is simple:
>
> - iterate over all transactions starting with the lowest
>    transaction id (tid)
> - for each transaction load the object states connected with tid
> - for each state fetch its outgoing references and fill a table where
>    all incoming references of an object are stored as an array.
>    if an state has no references write it anyway to the table with empty
>    outgoing references
>
> After reaching the highest tid its easy: all entries with no incoming
> references are garbage. Delete them and remove all incoming references
> of this entry from other entries.
>
> When new transactions are happening its enough to only process
> delta-transactions (newer than the last processed in the reference
> counting table). This is pretty fast if the delta small, i.e. one day
> only. On the deltas another check need to detect references gone
> meanwhile, but since the delta isnt that big this is probably not a
> problem.
>
> I implemented this roughly now. Building the incoming reference counting
> table from the scratch takes less than 7 hours. Deleting itself is not
> implemented yet, but this is not that difficult :-)
>
> Code
> ----
>
> The current code is at:
> https://github.com/bluedynamics/relstorage_packer
>
> Did I miss something? Any opinions much appreciated!
>
> Expect updates in this thread :)
>
> Jens Klein

-- 
Klein & Partner KG, member of BlueDynamics Alliance