[Zope-CVS] CVS: Packages/pypes/doc - query-engine.txt:1.2

Casey Duncan casey at zope.com
Mon Jun 7 00:14:26 EDT 2004


Update of /cvs-repository/Packages/pypes/doc
In directory cvs.zope.org:/tmp/cvs-serv29747/doc

Modified Files:
	query-engine.txt 
Log Message:
Bring the prose more in phase with reality and gain some self-confidence


=== Packages/pypes/doc/query-engine.txt 1.1 => 1.2 ===
--- Packages/pypes/doc/query-engine.txt:1.1	Wed Apr 14 01:25:58 2004
+++ Packages/pypes/doc/query-engine.txt	Mon Jun  7 00:14:24 2004
@@ -10,10 +10,6 @@
 "jumping off point" for future enhancement, assuming a continued and hopefully
 growing interest in its ongoing development.
 
-It is possible and probably likely that if you have made it this far, that
-you know more about query engines than I do. Look at this document as a
-plea for help as much as a description of the operation of the system ;^)
-
 Query Constituents
 ------------------
 
@@ -30,8 +26,9 @@
 Supported Features
 ------------------
 
-- Inputs that are arbitrary iterables. Pypes extents and query result inputs
-  are handled most efficiently.
+- Inputs are objects supporting the ISet interface (they do not need to
+  assert this interface, so builtin Python sets will work) or pypes
+  query result objects.
   
 - Criteria consisting of a Python expression of arbitrary complexity.
 
@@ -73,7 +70,9 @@
 
 - Constituent definition (i.e., inputs, criteria, order)
 
-- Query object instantiation
+- Criteria evaluation
+
+- Sorting
 
 - Result selection
 
@@ -90,51 +89,67 @@
 - Criteria and sort expressions are parsed, optimized and compiled. Names
   from the calling namespace and built-ins are bound.
 
-Query Instantiation
-~~~~~~~~~~~~~~~~~~~
-
-- During instantiation the arguments are collected and sorted and any invariants
-asserted.
-
-Selection
-~~~~~~~~~
-
-Selection is where most of the heavy lifting occurs. During selection the
-following occurs:
-
-- Free names in criteria and sort expressions are gathered and unioned with
-  the names selected for output. These are the names used by the query. Any
-  inputs not used are ignored for this query run.
-
-- Used inputs are sorted in order by their length to determine an order of
-  evaluation, where it is possible to compute it. Un-len-able inputs (i.e.,
-  iterators) are automatically last.
-  
 Criteria Evaluation
-+++++++++++++++++++
-
-- The product of the input lengths for inputs used by the expression is
-  computed. If sufficiently small, a scan of the cartesian product is performed
-  and the result is returned.
-  
-- If the criteria uses one input then request the registered index for the
-  input and expression. If one is found return the result derived from the
-  index.
-
-- Split criteria on boolean OR operators. Criteria selection is performed for
-  each criteria sub-expression. The union of these results is returned.
-
-- Split criteria on AND operators. The criteria list is sorted based on
-  estimated complexity calculated based on the number of inputs used for each
-  criteria, availability of indexes, the inputs' type and length product.
-  Criteria selection is performed for each sub-expression in-turn with each
-  result fed as the input for the next sub-expression evaluation. The
-  resulting intersection is returned.
-
-- If the criteria is an equal, greater than, less than or in comparison joining
-  two inputs with one input in each operand then perform the join operation
-  using the appropriate join function and return the result.
-
-- Scan the cartesian product of the inputs used and return the result.
-
+~~~~~~~~~~~~~~~~~~~
 
+Criteria is processed using partial result objects. If necessary, the criteria
+is divided into pieces and handled by multiple partial results which are
+ultimately combined to determine the final result. There are three kinds of
+partial results used by pypes:
+
+Cartesian Products -- These are the generalist brutes that can handle any query
+regardless of complexity, so long as you are patient enough. Cart. products are
+used with very small inputs where optimizations are not helpful, or when the
+complexity of the query requires them. They are often used in combination with
+the other partial results when selecting multiple outputs.
+
+Joins -- Joins handle efficient relational joining of pairs of objects. They
+are invoked by comparing objects from multiple inputs with one another.
+
+Actualized Results -- These results are processed as simple filters that
+each act on a single input. They are used to filter and reduce the input sets
+so there is less to process in later query stages.
+
+Queries use partial results in the following manner to process query criteria:
+
+- If the calculated magnitude, the maximum size of the query output, is below
+  a designated threshold, then a cart. product is returned as the result.
+
+- If possible, an actualized result is created and is returned as the result.
+
+- If possible, a join is created and returned as the result.
+
+- The criteria is split on boolean OR operators. Each sub-expression is
+  processed separately and the union of the partial results is returned.
+
+- The criteria is split on boolean AND operators. Each sub-expression is
+  processed in sequence and the intersection of the partial results is
+  returned.
+
+- If no more division of the criteria is possible, then a cart. product
+  is returned as the result.
+
+Sorting
+~~~~~~~
+
+If an order is specified by the query then it is evaluated using the results of
+criteria evaluation. The first order expression specified is fully evaluated to
+create an ordered result iterator. The second sort order expression (if any) is
+processed only when duplicate sort keys of the first order expression are
+encountered during result iteration. Remaining order expressions are evaluated
+similarly. Consequently, if the first order expression generates unique sort
+key values then no further expression are evaluated for the query run.
+
+A limit may be specified on the result size which is evaluated during sorting.
+Alternate sorting algorithms may be employed to more efficiently evaluate
+a limited query result.
+
+Result selection
+~~~~~~~~~~~~~~~~
+
+Selection determines the output of the query run. The output of a pypes query
+is an iterable of objects or tuples of joined objects. The output "columns"
+coorespond to the query inputs. When a single output is selected, the results
+consist of single objects. If multiple output columns are selected then the
+results consist of tuple "rows" of objects selected from the inputs by the
+criteria.




More information about the Zope-CVS mailing list