[Zope-dev] Generic filtering of objects

Phillip J. Eby pje@telecommunity.com
Wed, 10 Nov 1999 22:50:27 -0500


At 11:54 AM 11/11/99 +1100, Robert Leftwich wrote:
>I have a requirement to be able to apply a user-definable filter to all the
>views on the object database. Obviously this would be best applied at a low
>a level as possible in order to minimise the impact on the application code
>and to be made as generic as possible for maximum flexibility.
>To save re-inventing the wheel, has anyone else had this requirement and if
>so how did you implement it ?

Ty Sarna and I have come up with a concept for this, which we have not yet
implemented, as we are still finalizing design details.  We believe that it
could be made quite general.  The basic idea is that you have a "Predicate"
interface, for objects which can be combined with "and", "or", and "not" to
result in more complex predicates.  The Predicate interface would also
include functions to return a Generator or a Filter.  A Generator is an
object which can return a list of objects matching the Predicate, while the
Filter is a callable object which may be passed an object and returns true
if the object matches the Predicate.  Generators and Filters have to also
provide a small amount of metadata indicating their cost to filter or
generate information.

The and, or, and not operations return generic Intersect, Union, and
Complement objects which, when asked for their Generator or Filter, create
optimized versions based on the cost criteria of their component
Predicates.  For example, if you ask an Intersect predicate for a
generator, it should ask its component predicates for the generator likely
to produce the smallest set, then use the remaining predicates to filter
that generator.  Conversely, if you ask a Union for a filter, it should
apply its component Predicate filters in such a way as to maximize the
probability of an early hit.


Anyway, here's where the fun part comes in.  Let's say that you define a
set of Predicates that can either verify an object attribute when used as a
filter, OR can search a ZCatalog when used as a generator.  Let's say you
also define another predicate, such as "ContainedWithin", which tells you
if one object is reachable from another.  Now, to search your site for
things with a certain field value, you can do something like this:

blue_under_here = Keywords['blue'] & ContainedWithin[somefolder]

to get a Predicate which expresses your search.  Now, you can do something
like:

blue_under_here.getResults()

to get the actual objects, or, you can say:

blue_filter = blue_under_here.getFilter()

blue_filter(something)

to see if object 'something' matches the criterion.  Depending on what the
ContainedWithin and Keywords predicates tell their Intersection object, it
will pick one of them to return the objects and use the other predicate to
filter the list.

The interesting thing about the Predicate concept is that predicates can
apply to *anything*.  For example, suppose you create a PrimeNumber
predicate, and a FibonacciSequence predicate.  You can intersect them both
with a Range predicate, and find all the prime Fibonacci numbers in a
certain range.  Okay, maybe you don't have any reason to do that.  But how
about a RandomSelection predicate that selects a random member of the
enclosed set?  So that:

RandomSelection[Keywords['blue'] & ContainedWithin[somefolder]].getResults()

Will return a random member of the set of 'blue' keyword-bearing objects
within somefolder?  Might be fun for returning random taglines keyed to a
user's preferences, eh?  Or random selection of one they haven't seen in X
number of days, or...


How about predicates associated with a relational DB?  Intersect the two,
or join them.  By our convention, parameterized predicates (such as
Keywords and ContainedWithin) use __getitem__ or __getslice__ to return a
predicate which applies to the set of values given, which may itself be a
predicate (which is used as a filter for the values of the property the
predicate tests).  Hm.  That might not have been too clear.

Let's take Keywords as an example.  Keywords[StringContaining['u']] would
match keywords like 'blue', 'purple', and 'puce'.  If used as a filter, it
would do this by applying the inner predicate as a filter against an
attribute value of the object being checked.  If being used as a generator,
it would have to get the unique values for the keywords attribute from the
catalog first, then apply the StringContaining filter against them to get
the list it would use as a catalog query.  (Notice that while all
Predicates must support being used as filters, not all can support being
generators.  PrimeNumber and FibonacciSequence, for example, have to be
intersected with some finite number set before you can use them as a
generator.)

Let's look at more interesting ways to apply predicate projection (if
that's the correct term.)  How about:

ContainedWithin[Keywords['blue']]

Used as a filter, this would tell you whether an object had a parent with a
keywords attribute of 'blue'.  Used as a generator, it would search the
catalog for objects with keyword 'blue', then return all objects reachable
from those objects.

Is your head spinning yet?

How about this: you drop in some Predicates that union the predicates from
several catalogs plus an LDAP server and a relational database, so that
Keywords is now really Catalog1.Keywords || Catalog2.Keywords ||
SQLDB.Keywords || LDAP1.Keywords.  Performing any of the searches above
still works, only now you have a (lazy) search set spanning multiple data
sources.


Most predicates will probably want to check if they have a type or data
source in common when doing 'and' and 'or' operations so as to create
appropriate union or intersection predicates.  For example, a predicate
which refers to a field or index in a particular Catalog will want to see
if is being intersected with another from the same Catalog, as that will
allow it to construct a better query when used as a generator.

Okay, this all sounds great in principle, but where's the implementation?
We understand the basics of the abstract interface, but there are a few
details we need to work out to ensure that the framework will have high
interoperability.  The base module will probably include:

* Constants to be used in optimization metadata (e.g., COST_BUILTIN_FILTER,
COST_IN_MEMORY_GENERATOR, COST_DISK_ACCESS)

* Some basic predicates like AlwaysTrue, EmptySet, IsObject, EqualsObject,
AttributeValue

* Utility classes to construct atomic predicates from various standard
Python objects like callables and sequences, so that predicates like the
Keywords example don't have to care whether you pass them a function, a
string, or another predicate.  (e.g. KeyOf(dict) would give you a predicate
whose getFilter would return dict.has_key, which is ready to be used as a
filter.  getResults(), on the other hand, would return the results of
calling the dictionary's keys() method.)

* Basic Union and Intersect objects that can perform crude optimization and
re-organization, reshuffling their components to group Predicates together
that have common data sources, as well as predicting the best way to go
about doing a filtering or generating operation based on the components'
cost estimates.

We believe such a framework would make a great many interesting things
do-able in Zope, as well as serving as a fully general "object query
language" for Zope.  (Not to mention for Python in general!)  Which is why
we want to implement it.  Our first implementation will probably be a
strawman which we'll publish for comments and feedback.  We'll probably
implement some silly things like Prime, Fibonacci, RandomSelection,
IntegerRange, and so on, just to get the feel of using predicates.  We
might also do some wrappers over the Query objects which are part of the
Zope2 library.  They'll undoubtedly make great Filters for many purposes,
especially since they're implemented in C.  We'll probably also reuse some
code from Lazy.py so that generators can be kept from fully instantiating
huge result sets.