[Zope-CVS] CVS: Packages/pypes/doc - query-engine.txt:1.1

Casey Duncan casey at zope.com
Wed Apr 14 01:25:59 EDT 2004


Update of /cvs-repository/Packages/pypes/doc
In directory cvs.zope.org:/tmp/cvs-serv20463/doc

Added Files:
	query-engine.txt 
Log Message:
Add initial prose describing the query engine


=== Added File Packages/pypes/doc/query-engine.txt ===
==================
Pypes Query Engine
==================

This document attempts to explain the pypes query engine and its query
execution strategy. It should be noted that pypes is a young system and its
query engine although entirely general, is infantile in its ability to generate
an optimal query plan when compared to other more mature and scientifically
rigorous database engines. This initial implementation should be regarded as a
"jumping off point" for future enhancement, assuming a continued and hopefully
growing interest in its ongoing development.

It is possible and probably likely that if you have made it this far, that
you know more about query engines than I do. Look at this document as a
plea for help as much as a description of the operation of the system ;^)

Query Constituents
------------------

Query definitions consist of the following parts:

- `Inputs`: one or more collections of objects (usually sets) which contain
  the object being queried.
  
- `Criteria`: an executable expression which selects and joins the objects
  from the inputs for the query output.
  
- `Order`: A declaration that determines the order of the query output.

Supported Features
------------------

- Inputs that are arbitrary iterables. Pypes extents and query result inputs
  are handled most efficiently.
  
- Criteria consisting of a Python expression of arbitrary complexity.

- Capture of names from the calling context into the criteria expression.

- Optimization of criteria expressions to bind globals and builtins to
  constants, pre-evaluate subexpressions lacking free variables and
  boolean operations that can be reduced early.

- Joins of paired inputs with arbitrary operators, efficient support for
  equi-joins, greater, lesser and "in" joins.
  
- Sort declarations consisting of multiple arbitrary Python expressions
  with multiple simultaneous ordering.

- Result limits for efficient partial result evaluation.

- Flexible output selection.

- Analysis of the query execution plan.

Unsupported Features
--------------------

- Grouping

- Outer joins

- Efficient n-ary joins <shudder>

- Arbitrary projection expressions (only whole objects supported).

- Query hints

Query Execution
---------------

At the highest level, query execution proceeds as follows.

- Constituent definition (i.e., inputs, criteria, order)

- Query object instantiation

- Result selection

These stages can be executed at once or at different times, or even different
processes if the query itself is a persistent object.

Definition
~~~~~~~~~~

During definition, the following occurs:

- Inputs are named

- Criteria and sort expressions are parsed, optimized and compiled. Names
  from the calling namespace and built-ins are bound.

Query Instantiation
~~~~~~~~~~~~~~~~~~~

- During instantiation the arguments are collected and sorted and any invariants
asserted.

Selection
~~~~~~~~~

Selection is where most of the heavy lifting occurs. During selection the
following occurs:

- Free names in criteria and sort expressions are gathered and unioned with
  the names selected for output. These are the names used by the query. Any
  inputs not used are ignored for this query run.

- Used inputs are sorted in order by their length to determine an order of
  evaluation, where it is possible to compute it. Un-len-able inputs (i.e.,
  iterators) are automatically last.
  
Criteria Evaluation
+++++++++++++++++++

- The product of the input lengths for inputs used by the expression is
  computed. If sufficiently small, a scan of the cartesian product is performed
  and the result is returned.
  
- If the criteria uses one input then request the registered index for the
  input and expression. If one is found return the result derived from the
  index.

- Split criteria on boolean OR operators. Criteria selection is performed for
  each criteria sub-expression. The union of these results is returned.

- Split criteria on AND operators. The criteria list is sorted based on
  estimated complexity calculated based on the number of inputs used for each
  criteria, availability of indexes, the inputs' type and length product.
  Criteria selection is performed for each sub-expression in-turn with each
  result fed as the input for the next sub-expression evaluation. The
  resulting intersection is returned.

- If the criteria is an equal, greater than, less than or in comparison joining
  two inputs with one input in each operand then perform the join operation
  using the appropriate join function and return the result.

- Scan the cartesian product of the inputs used and return the result.






More information about the Zope-CVS mailing list