[ZODB-Dev] Blob, persistence and parentage questions

Martin Aspeli optilude+lists at gmail.com
Mon Sep 21 11:38:57 EDT 2009


Hi Jim,

Jim Fulton wrote:
> On Sun, Sep 20, 2009 at 6:10 AM, Martin Aspeli <optilude+lists at gmail.com> wrote:
>> I'm working on a package (plone.app.textfield) that's meant to solve the
>> common use case in Plone whereby we have a rich text field that's got a
>> "raw" value (a unicode string) with a MIME type (e.g. text/html or
>> text/structured), which is transformed to HTML on output (even HTML
>> input is transformed, because we do some markup tidying and stripping).
>>
>> We're trying to implement this as efficiently as possible for the most
>> common use case: the raw value is read infrequently (on the edit screen,
>> basically); transformed value is read frequently (every time the content
>> item is viewed).
>>
>> The approach we've taken is to store the mime type, the raw input and
>> the transformed output values in a RichTextValue object that's not
>> IPersistent (it derives from 'object' only) but knows its parent, via a
>> __parent__ pointer. This avoids a separate _p_jar so e.g. the object
>> isn't loaded or cached separately. We use the _p_changed protocol to
>> notify the parent when the value is changed.
>>
>> Furthermore, we store the raw value in a blob, since it can be
>> relatively big and is read/written infrequently. For performance, we
>> store the transformed output HTML on the object in a regular string, on
>> the assumption that it's almost always read when the object is used
>> (i.e. on its view).
> 
> So the raw value is a separate database object, since it is using a
> blob, which is a separate database object.  That contradicts what you
> say 2 paragraphs up.

True, but it's only read on the edit screen, which is infrequently accessed.

The 'output' value is a unicode string stored on an object that only 
derives from Object. Hence, in my understanding, we avoid a separate 
object load on in the most common use, i.e. on the 'view' screen where 
the transformed value is read. And on the edit screen, we only load one 
extra object (the blob) not two (the container + the blob).

>> I have a couple of questions about the performance and behaviour of this:
>>
>>  1) When a value is extracted from the request in the edit widget, it
>> is used to construct a RichTextValue, which is passed back for
>> validation etc. This means writing to a blob since setting the 'raw'
>> attribute writes to a blob. It is possible that validation will fail and
>> so the object will never be persisted (set onto its parent object). Is
>> it bad for performance to do write a "temporary" blob like this?
> 
> No.

Thanks, good. :)

>>  2) When a value is edited, we currently create a new RichTextValue
>> object and replace the old one with it. Hence, we get a new blob. Would
>> it be better to re-use the same blob and write a new value to it?
> 
> All other things being equal, it is better to update an existing
> object rather than creating a new one.  Objects consume index space
> and getting rid of unused objects, via GC is far more expensive that
> getting rid of old revisions via packing. Also, keeping the same
> object makes auditing changes sane, because you can use the history
> mechanism.

So the history is kept properly if I use blob.write("new value") or 
whatever?

It should be possible to optimise for the use case where we re-write to 
the same raw value blob except when the object is new and/or initialised 
with a default.

The easy solution, thinking about it, will be to write to a temporary 
blob in the widget and then read that value and write into the 
"permanent" blob if possible, or else just set the temporary container 
onto the object, thus making it "permanent".

>>  3) If the parent object is copied (e.g. via manage_cut/manage_paste in
>> Zope 2), will the blob be copied as well, or this something we need to
>> implement e.g. with event handlers for IObjectCopiedEvent? Bear in mind
>> that the blob is an attribute of a simple class (RichTextValue) which in
>> turn is an attribute of a persistent content object. The RichTextValue
>> has a __parent__ pointer to the content object.
> 
> It all depends on how the copy/paste is done.  If it's via database
> export and import, I believe it will work without extra effort.  I
> writing a test. :)
> 
> I suspect a blob isn't buying you anything here. In fact, I suspect a
> simple persistent object with a string value will serve you better.

Maybe so. I've yet to benchmark. If the blob doesn't slow down the 
"view" use case, then storing the (infrequently-used) raw value in a 
blob may possibly mean more efficient use of storage and memory. The 
ZODB cache won't need to hold the full raw value unless it is actually read.

>>  4) Speaking of that __parent__ pointer: if the content object is
>> copied, is that going to point to the old instance?
> 
> Again, that depends on your copy algorithm.

The copy/paste stuff in Zope 2 seems to export/import to a temporary 
file. Scary. :)

>> If so, we presumably
>> have to fix the reference up in an event handler? Is there a better way
>> to make an object that doesn't have its own _p_jar but doesn't need a
>> reference to the parent? I suppose this isn't really any different from
>> a folder structure in Zope 3 where each child has a __parent__ pointer
>> to its parent. When the parent is copied, what happens to those
>> __parent__ pointers?
> 
> I believe the copying algorithm is aware of them and handles them properly.

Yeah, it seems like it. I guess everything goes through such an 
algorithm anyway.

Martin

-- 
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book



More information about the ZODB-Dev mailing list