[ZODB-Dev] RelStorage branch 1.4.0-fastimport

Maurits van Rees m.van.rees at zestsoftware.nl
Fri Jan 7 12:34:46 EST 2011


Op 07-01-11 05:51, Shane Hathaway schreef:
> I looked at that branch before, but I felt like the changes were
> complicated enough to require a comparison with simpler solutions first.
>    In particular, Postgres has an option to disable fsync.  Set it in
> postgresql.conf.  Disabling fsync is not normally recommended, but for a
> large import it's obviously a good idea.  Would you compare the speed of
> --single-transaction with disabled fsync on vanilla RelStorage?

In /etc/postgresql/8.4/main/postgresql.conf I switched these settings 
off and restarted postgres:
fsync = off
synchronous_commit = off

The zodbconvert of the 150 MB CatalogData.fs went from 3.1 minutes to 
2.8.  When using the --single-transaction option of the branch it went 
to 2.2 minutes.  So there is still a significant improvement here.

> I assume the logging additions are less invasive, and if that assumption
> is correct, there's no problem with merging those.

That is correct.  I have just merged that part in revision 119443.

In 119444 I completely removed the remaining use of sys.stdout.write in 
the main method of zodbconvert, in favour of logging.  I presume this is 
no problem.

And a bit of pep8/pyflakes cleanup in the files I touched in revision 
119445; I hope you don't mind.

>
>> I added some more logging on that branch, mostly because the conversion
>> appeared to be hanging at some unknown point.  This was also with the
>> official 1.4.1 release, which was the reason I started experimenting
>> with the fastimport branch to see if that would help.  It did not.  At
>> least in both test runs mentioned above, the actual time it took was
>> about twenty minutes longer, possibly because the conversion temporarily
>> lost the connection with the postgres server.  With the logging I could
>> at least see that it was throwing the old transaction table away; I have
>> seen the same with other tables.  Definitely no one else is accessing
>> this database at the same time.  So if someone has an idea what could be
>> going on here, that is welcome.
>
> Are you sure you were not accidentally running multiple imports in
> parallel?  RelStorage does not throw away tables unless you're rather
> explicit about it.

No, it was just one import and the throwing away was actually good.  The 
problem was just that for some reason this throwing away took really 
long, which should not be.

I saw the same locally today during the tests with fsync mentioned 
above.  The times I mention there are for the real import of the 
transactions.  But sometimes the complete conversion took a bit more 
than 3 minutes and sometimes it was 7 minutes, of which 4 minutes were 
apparently just spent waiting for something during the deletion of a table.

This is one of the reasons I am interested in that logging and added 
some extra logging myself. :-)

>> Anyway, the --single-transaction seems to work and I would say the
>> logging is helpful.  So: is there any chance this can be merged to
>> trunk?
>
> Not yet.  We really need someone to do the fsync test, and if that
> doesn't do the trick, there has to be a way to accomplish the same thing
> in a clearer way.

Turning fsync off at least helps a bit.

I do not know if conversion to postgres takes significantly longer than 
to mysql or oracle.  But without the single-transaction option it took 
about 5 hours to convert a 16 GB Data.fs.  So shaving some time off can 
be interesting. :-)  But not at the cost of stability of course.


>>   I am also interested in the blob support that has been added
>> there. :-)
>
> The blob changes are also in 1.5.0a1.  That release seems to be a lot
> more stable than I expected and may soon become 1.5.0 final if no one
> finds bugs in the new blob option.

I plan to do tests with that next week and will report problems or 
success back here.  For now I can at least say that with current trunk 
including the logging changes of today the zodbconvert works for me.


For the record, to ease testing for myself I have put releases of the 
branch and current trunk (well, excluding the last two minor cleanup 
changes) here:
http://pypi.zestsoftware.nl/public/
If other people want to use those releases: feel free, but realize they 
are of course in no way official.

Cheers,

-- 
Maurits van Rees
Programmer, Zest Software



More information about the ZODB-Dev mailing list