[ZODB-Dev] Re: Ruby/Smalltalk OODB

Tue Jun 3 12:53:13 EDT 2008

>
>>> How does Gemstone implement efficient querying or indexing?
> [snip]
>
> Okay, this sounds like an indexing framework built into the database
> layer, something the ZODB doesn't have, but of course has been built
> on top with the catalog.

pretty much.

>
>
>> i haven't looked into the specific details of how they wire it  
>> altogether
>> but it comes down
>> to, gemstone is a fullstack. whether you are using the smalltalk,  
>> java or
>> eventual ruby...
>> they write the vm which has primitives to make their ops fast, has  
>> built in
>> persistence
>> so you just dont think about it at all. in fact, you have to ask  
>> for a class
>> to not be
>> persistent.
>
> Fullstack has its advantages, though also disadvantages. It means they
> need to reimplement compliant interpreters for any language they want
> to support, and that's going to hurt their library support. (as I
> doubt arbitrary CPython extensions would work with a hypothetical
> Python version of this)

indeed they would have to. disadvantage for them.
as an end user, i dont see that as really hurting me.

>
>
> [snip]
>> theres an object store shared by all. there are multiple vms  
>> instances
>> running whatever code and the
>> entire thing can run across multiple machines... need to scale, add  
>> more
>> machines in.
>
> This is something ZEO also provides, as far as I can grasp from your
> description.

indeed it does.

>
>
>> i'm still digging into it all, its only been 3 weeks so i still  
>> have a lot
>> of the terminology wrong etc,
>> but it really is a very cool product. not having to think about the  
>> data
>> store is just real nice.
>> its all just objects and you dont have to change anything about how  
>> you code
>> for them unless
>> you want to use indexes and then the changes are very minor.
>
> I'd say that the ZODB by itself also doesn't put heavy requirements on
> your code. The main thing is the subclassing from Persistent, and
> _p_changed flags if you use non-persistent subobjects you still want
> to persist.
>
> For indexing, a framework like Zope 3 requires zero changes to the
> classes themselves.
>

>>>> pull it back out and there it is again, object pointers fully  
>>>> intact.
>>>> store
>>>> in 2 different directories, modify in one, blam! modified in the  
>>>> other.
>>>
>>> I'm not sure how this is different than using the root object to  
>>> store
>>> objects and ZEO?
>>>
>>
>> if i have customer A who has order B
>>
>> and i store customer A to customer dictionary
>> and order B to order dictionary
>>
>> then later  access order B from order dictionary, modify and update  
>> it
>>
>> does ZEO update the instance of order pointed to by customer A?
>> I cant get it to do it. My understanding is it cant. Well, it could
>> but it isnt 'right out of the box' seamless.
>
> ZEO should do just that. I understand you have an object A which has a
> reference to B. You also have a dictionary that has a reference to A,
> and a dictionary that has a reference to B. Both A and the dictionary
> will be pointing to the same instance of B. (if A and B are both
> subclasses of Persistent. If not, it might be both serialize
> separately, I'm not sure).
>
>> If you do that in gemstone, there is only one copy of Order B, no  
>> matter
>> what variable in what dictionary you come at it from. And its drop  
>> dead
>> simple.
>
>> I looked at implementing that with zodb and moved along.
>
> I'm confused. This has been the way the ZODB worked for a long time,
> unless I'm really missing something in your description.

i tried to do this:

create customer that has order

so that i can have different extents type situations...

store customer in one dictionary.
store order in another.

if i pulled the order back out from the order dictionary and modified it
then pulled the customer out, the customers order was no longer in sync
with what came out of the order dictionary.

the reference was lost on serialization. original in memory objects  
were fine,
those that came back out from zodb werent.

i'm going to quote the initial email i sent with the idea in general  
and the followup i got
and i then tried it out to make sure i hadnt asked the question wrong,  
and yeah...
what i wanted to do, wasnt easily done.

the quotes:

> The biggest concern I have is how do to the layout/storage so that  
> this slightly contrived example works:
>
> Product has a brand.
> There are many brands.
>
> How do I store so that I can find all products simply and all brands  
> simply and also so that changes in a brand instance are reflected when
> the product instance is deserialized. By 'simply' I mean that it  
> doesnt really work on our end to have to walk all Products looking
> for unique brands. Should just be able to go directly in and get  
> said brands ( using keys() or similar call ).
>
> If I create 'brand' and 'product' as btrees, then if i do something  
> like
>
> some_product.brand.name = 'something entirely different'
>
> and that brand already exists in 'brand', would it be updated? are  
> references maintained in that fashion?
> do we have to handle manually on update and creation?
>
> Note that we would just be using ZODB not Zope in this scenario.

Back references are not maintained automatically.

I'd identify two classic solutions to this sort of thing.

One is to make a custom mapping (using a BTree as the inner data  
structure) that maintains back-references when objects are placed in  
them or removed.  zope.app(.container? .folder? I'd have to look) has  
code that does this, along with firing events.  For simple stories  
like the one you describe here, that's what I'd probably recommend.   
It works to the strengths of the ZODB, which particularly shines in  
terms of readability when you just need to walk a tree of attributes  
to get what you want.

The other is to keep an external index, a la zc.extrinsicreference or  
zc.relation.

zc.extrinsicreference does not have too many dependencies beyond ZODB,  
and as long as zope.app.keyreference doesn't drag much along with it,  
might be usable as a library.  That said, it's also very simple, and  
could be used as a model for you, even if you don't use it directly.   
It would also be a reasonable choice for a simple situation like the  
one you describe.  It relies on events to update its data structures.

zc.relation an almost-released-revision of zc.relationship that  
drastically reduces dependencies--actually, it has no additional  
dependencies to ZODB, as you can see at http://svn.zope.org/zc.relation/trunk/setup.py?view=markup 
.  It's also a bit overwhelming and low-level: see the README:http://svn.zope.org/zc.relation/trunk/src/zc/relation/README.txt?view=auto 
  .  It doesn't hook anything up for you: you set the relationship  
catalog up and you arrange for it to be updated, via events or direct  
messages.  That said, if you need its power, it is well-tested and  
would be a good choice for some jobs from at least some perspectives  
(caveat read-or: I'm the author).

HTH

Gary