[ZODB-Dev] Searching/wo/Zope

Fri Jan 6 15:54:05 EST 2006

Hi,

Tell me if you are tired from me. I do not use to write so long emails 
to email lists...

--------------------------------------------
Answer for Dieter's email:
Several days ago somebody already suggested to use IndexedCatalog. But 
it have not seem to be very active. The stand alone ZCatalog definitely 
not active.
My decision to use ZCatalog was: it is an extraction of code from Zope; 
Zope will be always maintained (isn't it?), just I have to figure out 
how to 'hack out' the stand alone ZCatalog based on Kevin Dangoor's 'hack'.
Kevin already mentioned that Zope 3 would be better for this purpose.

At this moment I think the best way for me in long term: handle the 
querying through a wrapper class of mine. Put a layer between my 
application(s) and the indexing application (something similar to RDBMS 
DB-API; changing the db backend (in my case changing the 
indexing/searching backend) without (big) changes in the application code).

Does some similar standards exist for ODBMSs? Similar to DB-API...

---------------------------------------------
>>Please do not forget that I am not a real programmer but a consumer:
> That's OK -- in return, please don't forget that you're posting to a ZODB
> developer's list ;-).
Not a real programmer: I have not learn it in the school; I read and try 
to use developer lists (and brains) ;-)

I just want to state: it is very difficult for me to formulate my 
questions in a way you understand them easily. Although I think I put 
more then enough time and energy to form relevant questions with 
relevant terminology avoid wastin the time of other people.

----------------------------------------------
> That's the only solid reason Zope Corp has to _pay_ for ZODB development.
> Zope pays the bills here, and ZODB is supporting infrastructure for Zope.

>>Then OK. But if there is bigger potential inside it...
> Then what, specifically?  Nobody works on something unless they want to,
> and/or are paid to.  It's not a matter of cheerleading, it's a matter of
> someone doing the work.
----------------------------------------------
Yes, I understand, that you have to feed your family and educate your 
child. So if it is not payed...

I am surprised that you ask 'then what, specificaly'. My opinion (as an 
outsider):
One of the major and important trend in application development: object 
persistance.

At this moment most of the developers use (I may not be true) some 
solution with RDBMS backend (SQLObject/Python; Hybernate/Java). But I 
think these solutions are not so transparent as ODBMSs, like ZODB, db4o 
(I tried these).

I think other programmers mostly use a solution with an RDBMS backend as 
  they do not want to handle/code the searching/indexing (I know that 
ODBMS performance told to be not so good as RDBMS, but I think if you 
calculate with the OR mapping overhead than some ODBMS are not so bed. 
E.g. some heavily loaded biological application Versant's ODBMS was 
reported better then one of the leader RDBMS ***).

"Specificaly" I think if you would implement a stand alone search/index 
possibility for ZODB (if you would use some similar Zope-hacking 
approach it would NOT be a huge, completely stand-alone project needing 
a lot of efforts) ZODB could be an ODBMS leader, competitor of other 
ODBMS solutions. So you may have more paying consumers, too...

---- (footnotes)
(***My biggest problem is: if I want other biologists to try my 
solutions/applications I do not have any chance for this with an RDBMS 
backend. They just will not learn how to setup and will not setup an 
RDBMS to try out something. For this reason an ODBMS system would be OK 
for me.)

----
(I think it would be also worth to implement some standards for ODBMSs. 
Some ODB-API. Just to be able to change ODBMS backends and/or 
indexing/searching backends. (I also know that there is no other
  ODBMS backend to choose from and you do not want to switch users from 
ZODB to other backend in the future :-) , but I think you would have the 
potential to do something like this. Just for fun. Just to initiate some 
ODBMS standardization....)

--------------------------------------------------------------------
> A notable exception is IndexedCatalog:
> 
>     http://www.async.com.br/projects/IndexedCatalog/
> 
> which is independent of Zope.  You said before that you thought that wasn't
> active, and it indeed doesn't look like it's had a release recently.  That
> could be because it's already perfect ;-) -- or it could be that there's not
I am not a 'real' developer, but it seems others also would not go with 
it, even though if it is perfect. If you (ZODB developers) change 
something in the next release of ZODB than the perfect IndexedCatalog 
may not be able to communicate with the new version of the ZODB backend...

>>(By the way: I think it is great and big, and I would like to use it.)
>>To formulate this on a more realistic way: it seems for me that there
>>is no potential to take care about this extra project outside of Zope
>>AND/OR it would not be good for Zope developers to have it as an
>>easy-to-use stand alone module (maybe some business policy?).
> Not sure I followed that.
Sorry. Please note I simplify in the next 2 points:
1. I mean 'potential': there are not 100 ZODB developers, just a few.
2. I mean 'business policy': If you would make a good stand alone 
indexing/searching possibility then everybody would use ZODB with 
CherryPy ;-) so less people would pay for Zope training/classes or 
something like that...

---------------------------------------------------
>>1. """That's usually viewed as an application-level problem, and it's up
>>to applications to solve it in ways best suited for their particular
>>needs.""" If I translate this for myself, if I understand well: I am very
>>happy that RDBMSs does not say this, and I can search them not only by
>>primary keys; I am happy that I do not have to implement something
>>similar to SQL as it is not considered as "application level problem".
> A relational database forces you to slam all your data into uniform tables,
> regardless of whether that's a natural fit.  When all you have is uniform
> tables, then it's relatively easy to define uniform operators for crawling
> over those tables -- that's what SQL is all about.
> 
> An object database is more of a general graph structure, and an
> application's idea of "search" can be correspondingly semantically richer
> from the start -- or even irrelevant, if the object graph is constructed
> from the start to make traversals of potential interest follow the natural
> graph pointers.  What's the analogue to SQL in this quite different view of
> the world?  Well, there isn't a standard accepted vision for that.  That's
> what makes it the app's problem.  These are tradeoffs.  Zope's assorted
> indices and catalogs _probably_ capture some notion of "search" close to
> what you're after.
Hkmm. I may not formulated well.
In this context SQL means (wanted to mean) for me just a standard. 
Although there are OQL-s or something like that, but there are also 
'native queries', or simple queries based on the object type. This may 
not work well (out-of-box) with Python as the objects are not decleared 
but created (or something like that). I think this is the main 
limitation factor in creating some standard searching, indexing, if you 
use Python.

So there are some standard ways/ideas of "search" in an object database. 
  Or with high oversimplification: you want to retrieve all objects that 
has the field/member 'name' and this field has the value 'Tamas'.

I think it is a low-level (database developers) problem ;-) how this 
searching is implemented. An application developer just should worry 
about to choose the best searching/indexing method/package suitable for 
his/her(?) application.

-----------------------------------------------------
>>2. BTrees: I could not find any 'built-in' possibility in the docs, just
>>the 'primary keys'. If I check the OOBTree, etc, it just give
>>'difference', 'intersection', 'union'. I do not see to do full text
>>search or field search on BTrees. Do I miss something???
> BTrees map keys to values.  The keys are always maintained in sorted order,
> and it's both dead easy and efficient to do range searches over a BTree's
> key space.  That's what's built in.
I used 'primary keys' as I thought if use just simple keys, you may 
thought I did not know that BTree is not a simple dictionary :-)

>>3. I can not build up another database from the ZODB as I am not a
>>developer.
> Do you use Python?  I'm at a bit of a loss to figure out how you wound up
> posting here if you're _not_ a Python programmer.  It could be that ZODB is
> much more general than I thought ;-), but I didn't think non-programmers
> would have any use for it.
For me _developers_ mean trained persons writing serious 
programs/packages using expressions I do not understand; solving of 
computional problems _always_ are trivial for them; they can always 
choose the best tools for their and my problems. ;-)

I am just playing with Python and programming. I am working with DNA, 
proteins, cell lines, etc. But I can not waste my spare time (should I 
write shorter emails?), so I have to find the best tools.

I write to the developer list as I can not receive help from other zope 
related list. I tried several months ago. The result: I moved to java; 
tried db4o; failed (I could not populate it with my objects (I have too 
many and complex objects; I am looking forward to see how ZODB preforms 
:-) )); trial of Perl; I was scared, so back to Python ;-)

------------------------------------------------------------------------
>>But I think you formulated this not the best way: I think you do not
>>build the SB database OUT of ZODB's BTrees, I think you just build
>>up indexes from the BTrees and you implement searches on your indexes
>>that points back to the BTrees.
> I suppose you could think of it that way, but I designed SpamBayes and
> that's not how I thought of it.  I thought of it in terms of abstract
> mappings, then designed the main algorithms to work directly with BTrees.
> ZODB supplies persistent BTrees, and that's all SB needs.

>>=> If you just build indexes from the BTrees, the following protocol
>>works for me and you can suggest?
> Not sure I'm following.  I can suggest what?
I think my 'following protocol' what you name 'abstract mapping'.
Suggest: how to implement indexes/searches on BTrees in ODBMS/ZODB.

>>1. walk trough on your BTree taking each object
> A BTree is a collection of <key, value> pairs, and unsure what "object"
> means here.
I use <"primary key", "object"> for <key, value> of BTtrees. I think you 
always have a python "object" as "value".
Walk trough on BTree taking each object: for each key of the BTree 
instanciate the value/object.

>>2. with an external indexing application build the index (on one or more
>>fields, or full text)
Store
(indexed_something1: key1, key65, ... key_i),
(indexed_something2: key4, key6, key45, ... key_j)
etc...
as INDEX

>>3. search in your index that returns with the 'primary key' of objects
>>in the ZODB
searching for something, e.g. indexed_something2:
a, Is indexed_something2 among the keys of INDEX?
b, -- NO: no results
    -- YES: return INDEX[indexed_something2] as list of keys to the 
objects in the BTree

>>4. get the objects from the ZODB via the 'primary keys' from the prev
>>step. ???

> OK, now I'm sure not following.  You appear to be assuming much more
> structure than a plain BTree supports on its own, and in fact BTrees don't
> really _appear_ to have anything to do with what you're saying.  If you
> think _your_ objects have such things as "fields" and "primary keys", then
> that's part of your objects' design and your objects' implementations --
> objects don't come with such notions built in.  It sounds like you have RDMS
> tables in mind, and are forcing object language on top of them.
Of course I have some 'tables' in my mind :-) I grown up on tables...
But I think you can admit that there are some analogies between the 
BTree keys and 'primary keys'; that an object is similar to a record, 
the fields/members of the objects are 'raws'.

I just would like to understand the basics of an index/search 
implementation on objects (on BTrees).

> If so, that's fine -- it's legitimate to do so.  It sounds like you'd be
> happier then with an RDMS, though (under the inference that you _think_ in
No! I would not be happier with RDBMS. I am just using ZODB for 1 or 2 
weeks and my life is happier :-)

======================================================================
REAL questions with less 'phylosophy':

1. If I want to implement an index system for ZODB, 'walking through' 
the key of the BTrees, instatnciate the objects and building the index 
is OK???
Or there is some low-level code 'magic' to use? I mean special 
"_function" from ZODB, learning deep internals of the BTrees, etc...

2. Do you see a possible way to implement indexes on the row file (.fs) 
without object instanciation?

----
(This mail list may not be the perfect place to ask, but I think you are 
among the best for Python Objects questions :-) )
3. Python objects are not decleared but created. If I have an object, 
anybody can just add extra members/fields/variables to my object or 
delete one member/field what I defined e.g. in the __init__().
Do you know some implemented locking mechanism that inhibits these 
things? Let say: the variables/fields/members of the objects are created 
in the __init__() but you can not add more or delete any of them after 
that point.

======================================================================
Thanks for your patient!
Tamas

-- 
Tamas Hegedus, PhD          | phone: (1) 919-966 0329
UNC - Biochem & Biophys     | fax:   (1) 919-966 5178
5007A Thurston-Bowles Bldg  | mailto:hegedus at med.unc.edu
Chapel Hill, NC, 27599-7248 | http://biohegedus.org