[Zope-dev] PathIndex doesn't index last part of path

sean.upton@uniontrib.com sean.upton@uniontrib.com
Mon, 19 Aug 2002 15:58:38 -0700


If the question is begged, why use the catalog: for folders with thousands
of objects, applications like CMF skins can be amazingly slow using
ObjectManager methods and CMF-wrapped equivalents; they also do not provide
sorting.  For applications like this, both containment and flexible query
support (sorting) are important.

Use-case: thousands of objects in a Portal Folder.  CMF Skins like Plone use
ObjectManager based methods to get a list of siblings in a sidebar.  When
you have 1000+ items in the folder, the fact that this takes 2 minutes on an
Athlon 2000+/1GB server, isn't batchable, and isn't sortable is sort of sad.
Use of the Catalog makes much more sense here (I can sort, and the
performance penalty is low, so I can take a slice of the resulting sequence
of brains, and do getObject() over each one; even if a container could be
improved performance wise to have a quick objectIds() method, ObjectManager
interfaces have no mechanism to specify sorting based on metadata like you
could do with a Catalog query).

One thing that seems to be feature-broken about PathIndexes, though, is the
lack of ability to specify a depth (not level, but depth of a match, for
example, to specify items).  Above and beyond support for a specific place
within a hierarchy, there needs to be a simple mechanism to support query
for an absolute path, and only objects directly matching that path, or
directly contained within that path.

The workaround for this is to add a custom FieldIndex that indexes the
string value of the path of the items container, and this requires putting a
python script in the root of a CMF portal, for example, that is acquired by
all indexed content.  It would be nice if the PathIndex machinery could do
this out of the box.
	
	http://lists.zope.org/pipermail/zope-cmf/2002-August/014167.html
	http://lists.zope.org/pipermail/zope-cmf/2002-August/014204.html

Sean

-----Original Message-----
From: Casey Duncan [mailto:casey@zope.com]
Sent: Friday, August 16, 2002 4:55 PM
To: Andy McKay; zope-dev@zope.org
Subject: Re: [Zope-dev] PathIndex doesn't index last part of path


A PathIndex is designed to make it more efficient to aggregate objects at
various levels of containment. Their primary use case AFAIK is to allow to
to limit queries to particular places within a hierarchy. The idea is to
eliminate recursive searching of leaf level folders when you want all
objects under a higher level and its child levels.

Also, by not indexing the nodes themselves, the index is an order of
magnitude smaller and searches are therefore faster and it takes less room
and is faster to update.

In fact there is no need to index the entire path of an object in the
catalog. Even with no Indexes defined, ZCatalog already does this for you.
The uid of every entry in the catalog is the full path to the object (as a
string). Unfortunately, ZCatalog does not expose this to the surface but you
can write a trivial external method to do it. And I might entertain adding a
ZCatalog API to do so if I had a good use case. Right now you can only
access entries by RID.

Now that begs the question, If you already know the path to the object you
are looking for, why are you using the Catalog in the first place? I highly
doubt doing what you describe below is faster than just directly accessing
the object. In fact I'd be willing to be its slower, especially since you
are searching two indexes to get it. Unless of course these are dynamically
generated objects of some kind (no stored in Zope).

As for making RIDs more permanent, that would basically require a rewrite of
the Catalog, and make certain operations much more expensive. As it stands,
your application should only assume that RIDs are valid within a single
transaction. You should use the path to uniquely identify objects, or some
application defined uid that gets cataloged otherwise.

-Casey

----- Original Message -----
From: "Andy McKay" <andy@agmweb.ca>
To: <zope-dev@zope.org>
Sent: Saturday, August 17, 2002 6:22 PM
Subject: [Zope-dev] PathIndex doesn't index last part of path


> This is mostly a question for AJ, but any input would be great. This bug
bit
> me today and is documented here:
> http://collector.zope.org/Zope/449/ISSUE_TRANSCRIPT/view
>
> I dont understand the brief argument against this one, it would make sense
> to me to able to pull an object out of the catalog based on its path. For
> example if I want /foo/bar/blammo, currently this means there is only one
> way of pulling the an object of the catalog given this path. Thats to send
> (path='/foo/bar', id='blammo'), rather than (path='/foo/bar/blammo'). Why
> wouldnt we want it this way?
>
> One thing I have done is store a whole bunch of references to objects as
> selected by the user. These are essentially random objects and the
quickest
> way is to pull them back out of the catalog. Of course I cant do more than
> one object per query (unless Im missing some other way) Id love to do
> (path=['/foo/bar/blammo', '/foo/bar/blammoz']) and get these 2 objects...
I
> think that would be neat.
>
> It would seem data_record_id_ is not guaranteed to permanent after a
> reindex_object (which CatalogAwareness uses), since this uncatalog and
then
> recatalogs the object. If this did work it would be cool and I could undo
> all the changes to my app back again.
>
> - The patch is already there, so Im curious why do we have what seems to
be
> a more limited design?
> - Would a halfway option such as path_match='final' be a choice that wont
> break any code but would confuse everyone and not make into the
> documentation?
> - Is it just a matter of fixing reindex_object as was suggested on #zope
so
> that data_record_id_ is more permanent?
>
> Cheers
> --
>   Andy McKay
>   Agmweb Consulting
>   http://www.agmweb.ca
>
>
>
>
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )
>


_______________________________________________
Zope-Dev maillist  -  Zope-Dev@zope.org
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )