[ZODB-Dev] Relstorage Blob support and Oracle

Sun Jun 12 06:01:18 EDT 2011

On Fri, Jun 10, 2011 at 17:46, Martijn Pieters <mj at zopatista.com> wrote:
> In testing I ran into some limitations of cx_Oracle; it uses the older
> (pre 10.1) OCI APIs that only allow offsets and sizes of up to 4GB
> (e.g. using 32-bit unsigned integers). Moreover, it's naive use of
> signed integers instead of unsigned ones means that on 32-bit
> platforms the largest *LOB write and read offsets it can handle is 2GB
> - 1 byte (sys.maxint). I've seen that the author is already aware that
> there is a newer API that upgrades the offset and size types to 64-bit
> unsigned integers, and I've asked him what the status is of cx_Oracle
> supporting that.

This morning it hit me that if we support multi-chunk ZODB blobs
anyway, we may as well store ZODB blobs over the maximum size
supported by cx_Oracle as mulitple chunks anyway! I've updated the
oracle blob upload handler to now chunk blob uploads if the ZODB blob
size would exceed the maximum size cx_Oracle can deal with.

> Next I'll look into adding PostgreSQL support too; it has a nicer API
> still in that it let's you specify files directly to upload from or
> download to.

Unfortunately, the blob_chunk schema in PostgreSQL is using the
`bytea` type to store ZODB blobs. Was there any particular reason you
choose this for blob storage and not the `oid` type? All the
benchmarks I have seen (like [1], [2]) bytea handling is much, much
slower than oid LOB handling. Moreover, oid's do not have the overhead
of the base64 encode/decode steps.

The only drawback to using oid's that I can see is that you have to
explicitely clean them up; just deleting rows from a table with an oid
column does not remove the actual data, so you have to either access
these in psycopg and call `.unlink()` on them or use a stored proc on
the postgreSQL side to do this for you. Not that big a deal really, as
this only has to happen at packing time.

How big a userbase is there for 1.5.0b2 on PostgreSQL? I know schema
changes are painful, but in this case we'd only have people on the
bleeding edge using a beta version to switch. I think we can come up
with a little script that would move these people over to a blob_chunk
table with oid columns.

In any case, I'll work on a version that, like the Oracle blob story,
stores ZODB blobs in chunks up to 2GB in PostgreSQL oid blobs instead.

To be clear: I do not intend to touch the OBJECT_STATE table, that
table can continue to use bytea as far as I am concerned.

-- 
Martijn Pieters

[1] http://archives.postgresql.org/pgsql-jdbc/2002-04/msg00100.php
[2] http://www.zephid.dk/2008/08/09/oid-vs-bytea-in-postgresql/