[ZODB-Dev] Searching/wo/Zope
Tamas Hegedus
hegedus at med.unc.edu
Fri Jan 6 15:54:05 EST 2006
Hi,
Tell me if you are tired from me. I do not use to write so long emails
to email lists...
--------------------------------------------
Answer for Dieter's email:
Several days ago somebody already suggested to use IndexedCatalog. But
it have not seem to be very active. The stand alone ZCatalog definitely
not active.
My decision to use ZCatalog was: it is an extraction of code from Zope;
Zope will be always maintained (isn't it?), just I have to figure out
how to 'hack out' the stand alone ZCatalog based on Kevin Dangoor's 'hack'.
Kevin already mentioned that Zope 3 would be better for this purpose.
At this moment I think the best way for me in long term: handle the
querying through a wrapper class of mine. Put a layer between my
application(s) and the indexing application (something similar to RDBMS
DB-API; changing the db backend (in my case changing the
indexing/searching backend) without (big) changes in the application code).
Does some similar standards exist for ODBMSs? Similar to DB-API...
---------------------------------------------
>>Please do not forget that I am not a real programmer but a consumer:
> That's OK -- in return, please don't forget that you're posting to a ZODB
> developer's list ;-).
Not a real programmer: I have not learn it in the school; I read and try
to use developer lists (and brains) ;-)
I just want to state: it is very difficult for me to formulate my
questions in a way you understand them easily. Although I think I put
more then enough time and energy to form relevant questions with
relevant terminology avoid wastin the time of other people.
----------------------------------------------
> That's the only solid reason Zope Corp has to _pay_ for ZODB development.
> Zope pays the bills here, and ZODB is supporting infrastructure for Zope.
>>Then OK. But if there is bigger potential inside it...
> Then what, specifically? Nobody works on something unless they want to,
> and/or are paid to. It's not a matter of cheerleading, it's a matter of
> someone doing the work.
----------------------------------------------
Yes, I understand, that you have to feed your family and educate your
child. So if it is not payed...
I am surprised that you ask 'then what, specificaly'. My opinion (as an
outsider):
One of the major and important trend in application development: object
persistance.
At this moment most of the developers use (I may not be true) some
solution with RDBMS backend (SQLObject/Python; Hybernate/Java). But I
think these solutions are not so transparent as ODBMSs, like ZODB, db4o
(I tried these).
I think other programmers mostly use a solution with an RDBMS backend as
they do not want to handle/code the searching/indexing (I know that
ODBMS performance told to be not so good as RDBMS, but I think if you
calculate with the OR mapping overhead than some ODBMS are not so bed.
E.g. some heavily loaded biological application Versant's ODBMS was
reported better then one of the leader RDBMS ***).
"Specificaly" I think if you would implement a stand alone search/index
possibility for ZODB (if you would use some similar Zope-hacking
approach it would NOT be a huge, completely stand-alone project needing
a lot of efforts) ZODB could be an ODBMS leader, competitor of other
ODBMS solutions. So you may have more paying consumers, too...
---- (footnotes)
(***My biggest problem is: if I want other biologists to try my
solutions/applications I do not have any chance for this with an RDBMS
backend. They just will not learn how to setup and will not setup an
RDBMS to try out something. For this reason an ODBMS system would be OK
for me.)
----
(I think it would be also worth to implement some standards for ODBMSs.
Some ODB-API. Just to be able to change ODBMS backends and/or
indexing/searching backends. (I also know that there is no other
ODBMS backend to choose from and you do not want to switch users from
ZODB to other backend in the future :-) , but I think you would have the
potential to do something like this. Just for fun. Just to initiate some
ODBMS standardization....)
--------------------------------------------------------------------
> A notable exception is IndexedCatalog:
>
> http://www.async.com.br/projects/IndexedCatalog/
>
> which is independent of Zope. You said before that you thought that wasn't
> active, and it indeed doesn't look like it's had a release recently. That
> could be because it's already perfect ;-) -- or it could be that there's not
I am not a 'real' developer, but it seems others also would not go with
it, even though if it is perfect. If you (ZODB developers) change
something in the next release of ZODB than the perfect IndexedCatalog
may not be able to communicate with the new version of the ZODB backend...
>>(By the way: I think it is great and big, and I would like to use it.)
>>To formulate this on a more realistic way: it seems for me that there
>>is no potential to take care about this extra project outside of Zope
>>AND/OR it would not be good for Zope developers to have it as an
>>easy-to-use stand alone module (maybe some business policy?).
> Not sure I followed that.
Sorry. Please note I simplify in the next 2 points:
1. I mean 'potential': there are not 100 ZODB developers, just a few.
2. I mean 'business policy': If you would make a good stand alone
indexing/searching possibility then everybody would use ZODB with
CherryPy ;-) so less people would pay for Zope training/classes or
something like that...
---------------------------------------------------
>>1. """That's usually viewed as an application-level problem, and it's up
>>to applications to solve it in ways best suited for their particular
>>needs.""" If I translate this for myself, if I understand well: I am very
>>happy that RDBMSs does not say this, and I can search them not only by
>>primary keys; I am happy that I do not have to implement something
>>similar to SQL as it is not considered as "application level problem".
> A relational database forces you to slam all your data into uniform tables,
> regardless of whether that's a natural fit. When all you have is uniform
> tables, then it's relatively easy to define uniform operators for crawling
> over those tables -- that's what SQL is all about.
>
> An object database is more of a general graph structure, and an
> application's idea of "search" can be correspondingly semantically richer
> from the start -- or even irrelevant, if the object graph is constructed
> from the start to make traversals of potential interest follow the natural
> graph pointers. What's the analogue to SQL in this quite different view of
> the world? Well, there isn't a standard accepted vision for that. That's
> what makes it the app's problem. These are tradeoffs. Zope's assorted
> indices and catalogs _probably_ capture some notion of "search" close to
> what you're after.
Hkmm. I may not formulated well.
In this context SQL means (wanted to mean) for me just a standard.
Although there are OQL-s or something like that, but there are also
'native queries', or simple queries based on the object type. This may
not work well (out-of-box) with Python as the objects are not decleared
but created (or something like that). I think this is the main
limitation factor in creating some standard searching, indexing, if you
use Python.
So there are some standard ways/ideas of "search" in an object database.
Or with high oversimplification: you want to retrieve all objects that
has the field/member 'name' and this field has the value 'Tamas'.
I think it is a low-level (database developers) problem ;-) how this
searching is implemented. An application developer just should worry
about to choose the best searching/indexing method/package suitable for
his/her(?) application.
-----------------------------------------------------
>>2. BTrees: I could not find any 'built-in' possibility in the docs, just
>>the 'primary keys'. If I check the OOBTree, etc, it just give
>>'difference', 'intersection', 'union'. I do not see to do full text
>>search or field search on BTrees. Do I miss something???
> BTrees map keys to values. The keys are always maintained in sorted order,
> and it's both dead easy and efficient to do range searches over a BTree's
> key space. That's what's built in.
I used 'primary keys' as I thought if use just simple keys, you may
thought I did not know that BTree is not a simple dictionary :-)
>>3. I can not build up another database from the ZODB as I am not a
>>developer.
> Do you use Python? I'm at a bit of a loss to figure out how you wound up
> posting here if you're _not_ a Python programmer. It could be that ZODB is
> much more general than I thought ;-), but I didn't think non-programmers
> would have any use for it.
For me _developers_ mean trained persons writing serious
programs/packages using expressions I do not understand; solving of
computional problems _always_ are trivial for them; they can always
choose the best tools for their and my problems. ;-)
I am just playing with Python and programming. I am working with DNA,
proteins, cell lines, etc. But I can not waste my spare time (should I
write shorter emails?), so I have to find the best tools.
I write to the developer list as I can not receive help from other zope
related list. I tried several months ago. The result: I moved to java;
tried db4o; failed (I could not populate it with my objects (I have too
many and complex objects; I am looking forward to see how ZODB preforms
:-) )); trial of Perl; I was scared, so back to Python ;-)
------------------------------------------------------------------------
>>But I think you formulated this not the best way: I think you do not
>>build the SB database OUT of ZODB's BTrees, I think you just build
>>up indexes from the BTrees and you implement searches on your indexes
>>that points back to the BTrees.
> I suppose you could think of it that way, but I designed SpamBayes and
> that's not how I thought of it. I thought of it in terms of abstract
> mappings, then designed the main algorithms to work directly with BTrees.
> ZODB supplies persistent BTrees, and that's all SB needs.
>>=> If you just build indexes from the BTrees, the following protocol
>>works for me and you can suggest?
> Not sure I'm following. I can suggest what?
I think my 'following protocol' what you name 'abstract mapping'.
Suggest: how to implement indexes/searches on BTrees in ODBMS/ZODB.
>>1. walk trough on your BTree taking each object
> A BTree is a collection of <key, value> pairs, and unsure what "object"
> means here.
I use <"primary key", "object"> for <key, value> of BTtrees. I think you
always have a python "object" as "value".
Walk trough on BTree taking each object: for each key of the BTree
instanciate the value/object.
>>2. with an external indexing application build the index (on one or more
>>fields, or full text)
Store
(indexed_something1: key1, key65, ... key_i),
(indexed_something2: key4, key6, key45, ... key_j)
etc...
as INDEX
>>3. search in your index that returns with the 'primary key' of objects
>>in the ZODB
searching for something, e.g. indexed_something2:
a, Is indexed_something2 among the keys of INDEX?
b, -- NO: no results
-- YES: return INDEX[indexed_something2] as list of keys to the
objects in the BTree
>>4. get the objects from the ZODB via the 'primary keys' from the prev
>>step. ???
> OK, now I'm sure not following. You appear to be assuming much more
> structure than a plain BTree supports on its own, and in fact BTrees don't
> really _appear_ to have anything to do with what you're saying. If you
> think _your_ objects have such things as "fields" and "primary keys", then
> that's part of your objects' design and your objects' implementations --
> objects don't come with such notions built in. It sounds like you have RDMS
> tables in mind, and are forcing object language on top of them.
Of course I have some 'tables' in my mind :-) I grown up on tables...
But I think you can admit that there are some analogies between the
BTree keys and 'primary keys'; that an object is similar to a record,
the fields/members of the objects are 'raws'.
I just would like to understand the basics of an index/search
implementation on objects (on BTrees).
> If so, that's fine -- it's legitimate to do so. It sounds like you'd be
> happier then with an RDMS, though (under the inference that you _think_ in
No! I would not be happier with RDBMS. I am just using ZODB for 1 or 2
weeks and my life is happier :-)
======================================================================
REAL questions with less 'phylosophy':
1. If I want to implement an index system for ZODB, 'walking through'
the key of the BTrees, instatnciate the objects and building the index
is OK???
Or there is some low-level code 'magic' to use? I mean special
"_function" from ZODB, learning deep internals of the BTrees, etc...
2. Do you see a possible way to implement indexes on the row file (.fs)
without object instanciation?
----
(This mail list may not be the perfect place to ask, but I think you are
among the best for Python Objects questions :-) )
3. Python objects are not decleared but created. If I have an object,
anybody can just add extra members/fields/variables to my object or
delete one member/field what I defined e.g. in the __init__().
Do you know some implemented locking mechanism that inhibits these
things? Let say: the variables/fields/members of the objects are created
in the __init__() but you can not add more or delete any of them after
that point.
======================================================================
Thanks for your patient!
Tamas
--
Tamas Hegedus, PhD | phone: (1) 919-966 0329
UNC - Biochem & Biophys | fax: (1) 919-966 5178
5007A Thurston-Bowles Bldg | mailto:hegedus at med.unc.edu
Chapel Hill, NC, 27599-7248 | http://biohegedus.org
More information about the ZODB-Dev
mailing list