[ZODB-Dev] Relstorage Blob support and Oracle

Hanno Schlichting hanno at hannosch.eu
Sun Jun 12 10:40:49 EDT 2011


On Sun, Jun 12, 2011 at 12:01 PM, Martijn Pieters <mj at zopatista.com> wrote:
> Unfortunately, the blob_chunk schema in PostgreSQL is using the
> `bytea` type to store ZODB blobs. Was there any particular reason you
> choose this for blob storage and not the `oid` type? All the
> benchmarks I have seen (like [1], [2]) bytea handling is much, much
> slower than oid LOB handling. Moreover, oid's do not have the overhead
> of the base64 encode/decode steps.

I think avoiding the encode/decode step is a big win here already.
Being able to just do a pg_import/pg_export from the filesystem into
Postgres should be another big win.

Looking at the most recent docs for the bytea type [1] there's two
encoding schemes. The new default in 9.0+ is called hex and doesn't
suffer from the same problems as the old "escape" format, which
couldn't represent some data - which in turn prompted Shane to do the
base64 encoding.

But if you look a bit deeper on how Postgres actually stores the data,
it's clear it's not really made for storing blobs. Any data going into
a large object stores a OID pointer to a record in the pg_largeobject
[2] table:

"The catalog pg_largeobject holds the data making up "large objects".
A large object is identified by an OID assigned when it is created.
Each large object is broken into segments or "pages" small enough to
be conveniently stored as rows in pg_largeobject. The amount of data
per page is defined to be LOBLKSIZE (which is currently BLCKSZ/4, or
typically 2 kB)."

The other approach they have for storing bytea is called toast [3] and
ends up doing pretty much the same. At the end there's also a separate
table that stores 2kb chunks of the data.

Wow! So they assume file systems with 8kb block size and then store
blobs as 2kb pages with all the overhead of keeping table rows for
each of those entries. That's pretty much the same Oracle did before
11g.

One thing that concerns me though is this:

postgres=# \dS+ pg_largeobject
Table "pg_catalog.pg_largeobject"
 Column |  Type   | Modifiers | Storage  | Description
--------+---------+-----------+----------+-------------
 loid   | oid     | not null  | plain    |
 pageno | integer | not null  | plain    |
 data   | bytea   |           | extended |

It looks like the pg_largeobject table actually uses a bytea column
itself. This is on a 9.0 database. But if that's true on pre-9.0
databases as well, the same restrictions of the "escape" format should
apply here unless the largeobject C code handles things for you.

Hanno

[1] http://www.postgresql.org/docs/9.0/static/datatype-binary.html
[2] http://www.postgresql.org/docs/9.0/static/catalog-pg-largeobject.html
[3] http://www.postgresql.org/docs/9.0/static/storage-toast.html


More information about the ZODB-Dev mailing list