[ZODB-Dev] SQL/ZODB integration (was Re: ZODB and new styleclasses?)

Tue, 02 Jul 2002 15:08:58 -0400

At 02:38 PM 7/2/02 -0400, Jim Fulton wrote:
>Phillip J. Eby wrote:
>
>> 
>> Right, and not all back-ends provide any kind of notification that lets you
>> do this.  I suppose you could reasonably consider this a deficiency of the
>> back-ends rather than of the ZODB framework.  :)  But it's nonetheless a
>> requirement for many people to use such back-ends.  :)
>
>This is a good point. I tend to take the invalidation protocol for granted.
>
>I hope this doesn't prove fatal.

I don't think it is.  My approach is simply to allow specification of
caching policies.  I think you thought that I meant policy for how long to
keep things cached, from a memory-use perspective.  I meant it from a "how
willing are you to have out-of-date data" perspective.  A discussion board
and a high-volume reservation system have very different requirements in
this regard, for example.  :)

>OTOH, I don't see how you can do an effective caching implimentation without
>notification (invalidation). I wonder how other databases handle this.
>Oracle does a lot of caching, I wonder if they have a notification model.
I bet they
>do. Does anyone know if Postgress caches on the client?

Don't know, but I don't really think it matters, as long as cache policies
are tunable, and all the design work I've done in relation to using the
existing P&T frameworks from ZODB is compatible with doing that.  I think
asynchronous invalidation should just be considered *one* of the possible
approaches to caching.  Good if you can get it, but handle-able otherwise.

In my experience of caching with SQL databases, all you really want to
cache are things that you're constantly looking up and change infrequently;
the database, if it's of any size, has its own caching on its side.
Caching is mostly to save round-trip latency, as few queries will be
lengthy *and* frequently used *and* can't be out of date.  Most business
reports that involve complex processing occur on some kind of regularized
schedule, and once produced, you can treat the result as a fixed document -
even put it in a CMS, if you like.  Other queries occur very frequently,
but 99% of the time there's going to have been some change to the data
since the last time you queried, *and* you absolutely must have the latest
info.  There's no point to caching that stuff beyond transaction
boundaries.  That leaves things that you use all the time, but hardly ever
change, which are mostly things you could almost consider metadata, like
what departments you have or what accounts are in the bookkeeping system,
stuff like that.  Much of this data you can declare a specific point where
the stuff will change, maybe even in advance and have a known time to flush
the data (at least if you're designing the app and can add this kind of
thing to the database schema).  Some of it, you can do the moral equivalent
of sending SIGHUP to Apache and telling it to reload its configuration,
"between transactions".

Anyway, that's why I see caching as being either long-term (across
transactions) or short-term (within transactions).  "Enterprise"
applications need both kinds.

Now that I've said all that, I should probably still mention that many
databases *do* provide ways (however kludgy) of doing notifications.
Sybase, Oracle, and PostgreSQL all can "call out" to other languages, as
invoked by a trigger, which could be used to send that "SIGHUP equivalent"
message in the form of an XML-RPC call or something like that.  But it's
not very cross-platform, not very scalable, etc.  You really don't want it
for frequent notifications, IMHO.

>> Eeek!  I can see this for CMS applications, but some applications really
>> *need* pessimistic conflict detection, i.e. locking.
>
>That's fair enough. That raises another possibly fatal issue.

Non-fatal, IMHO.  If you're reading volatile data (because you don't keep
it cached), and you are in a transaction that's properly specified, the
RDBMS will hold a read lock that can be upgraded to a write lock later in
the transaction.  My framework design also handles it by "pre-committing"
objects (i.e. sending SQL to do updates) at the earliest convenient moment,
which is defined as when you next need to do something (like a query) that
would rely on the data being up-to-date in the underlying RDBMS.  An
application can also specifically request that a jar ensure changed objects
have been sent to the RDBMS.  Note that this doesn't affect the transaction
in any way; it's sort of like a sub-transaction commit, except that it
can't be rolled back separately from the current (sub)transaction.

>I read a book once (Atomic Transactions, by William Weil) that gave a
mathematical
>proof that time-stamp protocols and locking protocols can be mixed in the
same TM.

I think I read that one, although I doubt I followed the proof.  :)  I
imagine I'll be re-visiting the library to borrow it again soon, if only so
we can use it as a basis for common terminology!  :)

>OTOH, I think few databases use locking protocols any more.

What?  I know for a fact Sybase's latest version still uses locking.
They've moved to progressively finer-grained locking systems over time, but
it's still locking.  I thought optimistic concurrency management was still
a rarity in large systems, although I could be wrong.  I understand it
tends to be inefficient for high-volume, high-conflict applications.

>Locking is certainly
>hard to mix with object persistence. OTOH, I suspect you can get a
pessimistic
>time-stamp protocol.

That, I don't remember from the book.  But I'm not sure we really need to
worry about it.  Based on my work to date, I think it's very possible to
share a P&T framework without embedding any kind of policy for this stuff.
The key is making things controllable in context.