[Zope3-dev] comments on Guido's diary

Mon, 09 Dec 2002 13:51:37 +0000

Hi folks,

I found Guido's diary an excellent read, and very useful for me as I had 
not been keeping a journal of my activities at the sprint.

Guido's diary talks about the various things he discovered about zope3 
while working on the Query Service. I'd like to try to clarify some of 
these things, and put them in the more general context of Zope 3 rather 
than the specific context of the Query Service.

I should also mention that this is a pretty long email.

 > We sketch three interfaces that apply directly to the text index:
 > IQuerying, IRanking, and IInjection.  IQuerying defines query() which
 > takes a query string (more structured things are considered YAGNI for
 > now) and returns an unranked result set.  IRanking defines rank()
 > which takes a result set and "batching metadata" and returns the
 > requested batch, plus a total count.  IInjection is to be used by the
 > object hub when objects get registered, unregistered, or modified.

This part is about interfaces specific to the TextIndex. These 
interfaces aren't really part of the QueryService.
I hope to produce some overall documentation for the QueryService at the 
Vilnius sprint this week.

 > There was some confusion about the interfaces: first it appeared that
 > Steve wanted some of the building blocks to return stuff that was an
 > input parameter, such as the batching metadata and the original
 > query.  Later he took that back.  Also, the batching metadata was
 > simplified to two arguments (start, count).

Originally, we decided it would be simpler if the TextIndex 
implementation for use in Zope 3 would implement IQueryProcessor 
directly. In which case, it would need to pass through the batching 
metadata for use by other QueryProcessors in the query processing pipeline.

A little later, we decided that a better approach would be to use an 
IQueryProcessor adapter seperate from the TextIndex. So, the TextIndex 
doesn't actually need to be aware of the QueryService and Queries at 
all. It provides its own APIs for making a query. This turned out to be 
a much more elegant approach.

 > The modifications were to support batching more directly, by giving
 > query() a start and count argument, and scaling the results to a float
 > in the range [0.0, 1.1].

I think that's supposed to be [0.0, 1.0].

 > Next, we started creating the Zope/App code to work with the text
 > index.  After checking it in as Zope/App/Indexes/TextIndex.py, we
 > found that the other team on our group (who are doing the yellow boxes
 > for SteveA -- I still don't understand that part)

On the first evening of the sprint, after Jim's day-long tutorial, I 
gave a short presentation on the QueryService. In my schematic of how 
the components hook up to each other, I had coloured the Index 
components in green, and the QueryProcessor components yellow.

It is good to keep the Index components ignorant of the QueryService, so 
that they can be used with other services, such as a Relationships service.

 > After a while Jim suggested that we try
 > out the new naming conventions.  We changed to
 > Zope/App/index/text/index.py, and later put every else in the same
 > directory (even browser view stuff).  The double use of "index" is a
 > bit weird (I now think the first one should have been indexing), but I
 > love having all the parts in one directory.

The new naming guidelines that were presented at the sprint had packages 
as nouns. So, Zope/App/index/text/index.py is correct.
(Actually, after the refactoring we'll probably have 
Zope/app/index/text/index.py)

 > The first thing we did was the notification interface.  There's an
 > extremely generic ISubscriber interface, which defines one method that
 > you must implement to be a subscriber, notify(event).  We implement
 > this in a class TextIndex, which extends TextIndexWrapper.

Guido and Christian Z. refactored the ZCTextIndex into three classes 
each other:

   Zope/TextIndex

     This package contains a text index for use with the ZODB.
     It does not depend on Zope in any way.

   Zope/TextIndex/TextIndexWrapper

     This module contains a class that provides a convenient wrapper for
     using the text index in an application, so you don't need to know
     about the internal partitioning of responsibilities in the text
     index implementation. It is a text index black-box.
     The TextIndexWrapper class defines an attribute that contains a
     TextIndex. In UML terms, this is aggregation or composition.
     It is for use with the ZODB. It does not depend on Zope in any way.

   Zope/App/index/text

     This package contains the Zope 3 text index. It depends on the
     Zope 3 Event Service, and thus on the Component Architecture.
     However, it does not depend on or even know about the
     Query Service. You could use it in a Zope 3 application, and
     query it directly from that application with no need for a
     Query Service.
     It has browser views. You can add it in a TTW package. It has
     security declarations in its package's configure.zcml file.
     It derives from TextIndexWrapper.

I hope that someone will be motivated to write a README.txt and examples 
package for Zope/TextIndex to show how it can effectively be used in 
regular python and ZODB programs.
This has already been done by Kapil and Guido W. for the security 
package, with notable success.

 > To deal
 > with this, there are two APIs with the same arguments (a common Zope 3
 > rhythm): when there's no match, getAdapter() raises an exception
 > (which one?)

Zope.ComponentArchitecture.Exceptions.ComponentLookupError

 > while queryAdapter() returns None.  (I guess Python's
 > dict.get() is an exception, and getattr() is half an exception. :-)

Something to think about for Python 3000 ;-)

 > ContextMethods are a bit like const
 > in C++: if you call a ContextMethod, you must be one yourself, so they
 > multiply like rabbits.

On my todo list is to make it so you can declare your class to be a 
ContextClass (or something like that). This would mean that all the 
methods other than __init__ would become ContextMethods, and all your 
properties would become ContextProperties. You wouldn't need to have

   foo = ContextMethod(foo)

...all over your code

This needs a little more thought though.

 > Testing notify() is a bit of a letdown: you create an event object of
 > a specific class (there are implementations matching all the diverse
 > interfaces -- what a waste)

I had a discussion with Jim on the plane trip here about Exceptions, 
Events and Interfaces. They have certain abstract qualities in common. 
In many ways, it would be nice to combine the IEvent interface hierarchy 
and the Event class hierarchy into a single hierarchy of things that 
represent both contract and implementation.
We both agreed that it is ok to leave things as they are for now. This 
requires a lot of thought, and there are more important things to do for 
Zope 3.

 > I asked Steve if he could show me how to write a similar test that
 > involves a real ObjectHub, subscription, etc.  He was almost offended,
 > because in his eyes this is a functional test.  But he showed me how
 > anyway. :-)

Jim has been talking about using test.py to run functional tests, and 
having suitable base-classes for functional tests. These would parse all 
of the zcml files and set up a temporary Data.fs with suitable default 
services, before running the tests. Each test would be in its own 
transaction, with (in the general case) get_transaction().abort() called 
at the end of each test.

 > The hardest part is faking out the traversal mechanism (which I did
 > *not* want to set up for real).  I ended up creating a fake traverser,
 > implementing ITraverser, which has a travers() method that calls
 > locationAsUnicode() on its path argument, and then returns a
 > pre-determined object.  Otherwise it must raise KeyError.

Oversimplified statement:
   Whereas Zope2 is all about the ZODB and persistent objects, Zope3
   is all about Traversal.

 > This all is needed, BTW, because the ObjectHub implements a two-way
 > mapping between hub ids (ints that it makes up) and locations (in
 > canonical form, unicode strings of the form u"/path/to/an/object").

Actually, the ObjectHub stores its locations in the other canonical 
tuple-of-unicodes form. This is so you can ask the ObjectHub for the 
registrations it holds below a particular location. It is easier to 
manipulate locations made up of tuples in this way.

 > You can't play with an object hub without working traversal.

Right. I would point out that because the outer aspect of traversal is 
specified by the ITraversal interface, it is easy to plug in a simple 
traversal system that does exactly what is required for your unit test.
See the class FakeTraverser in Zope/App/index/text/tests/test_index.py
for an example.

 > To establish our fake traverser (which effectively implements a tiny
 > fixed namespace, consisting of the path u"/bruce" and a single object
 > at that location), we call provideAdapter(None, ITraverser, factory)
 > where factory is a one-argument lambda that returns our traverser
 > instance.  (Long live nested scopes.)  None represents the source
 > interface -- I guess this means it can adapt anything.

Yes. That is exactly what "None" means. As far as adapters and the 
Interfaces package is concerned, all objects implement the special 
"None" interface. If an object has an __implements__ attribute that 
gives other interfaces, it still implements the "None" interface.

 > The factory
 > argument is probably the object to be adapted -- our fake traverser
 > doesn't need it.

No. The factory argument is a callable that takes a single argument, and 
which returns the adapter. This is an 'adapter factory' -- a thing that 
makes adapters on demand.

So, when you provide an adapter, you are actually registering with the 
adapter service an adapter factory to provide adapters. When you call 
getAdapter, you are asking the adapter service to look up an adapter 
factory that will adapt from one of your object's interfaces to the 
interface you desire. The factory is then asked to make an adapter for 
you. This adapter is what you get back from getAdapter.
So, in a parallel universe, 'provideAdapter' could be called 
'registerAdapterFactory'.

 > The register() method returns the hubid.  There's also an unregister()
 > method, also taking a location.  To generate a "modified" event, we
 > must create an event -- not an ObjectModifiedHubEvent, but an
 > ObjectModifiedEvent -- and pass it to the hub's notify() method.  The
 > hub then broadcasts it to all its subscribers (for such events).

Actually, if you send an ObjectModifiedHubEvent to the ObjectHub, it 
will send that event on to all its subscribers (for such events).

The ObjectHub sends on all the events it receives. For ObjectEvents 
where the object in the ObjectEvent is registered with the objecthub, it 
sends on an additional ObjectHubEvent.

 > So now we wanted an actual honest-to-God TTW user inteface for
 > interacting with the text index!  Christian Z showed me a trick,
 > tal:omit-tag="" that I don't remember; it may be new or I'm getting
 > old.

You can get a similar effect by using an element whose tag name is in 
the tal namespace. Something like this:

   <tal:call define="foo context/methodcall" />

or perhaps

     <tal:whatever condition="context/methodcall" />

 > I believe now that this <require> directive is the main reason why we
 > needed to define IStatistics.

That is one reason. The other is so that people can write views for your 
textindex content object without having to read its code. They write a 
view that depends on IStatistics.

 > - Click "default package"; this goes to a part of the services
 >   namespace called the default package (the purpose of packages is
 >   still vague; you can also get there by clicking on Packages and then
 >   on 'default')

A package is a place to store instances of service components, and other 
service-kind-of instances such as modules and templates.
A package also forms a unit of distribution. I expect people to be able 
to work on some unit of functionality in a package, then export that 
package into a .zip file, for import by someone else on a different server.
This stuff isn't done yet. We'll probably be working on it at the 
Vilnius sprint with Codeworks.

 > - There's at least one item here, 'configure', which will become
 >   important later; for now, ignore these
 >
 > - Click on the "Add..." link; this takes you to an Add menu.  The Text
 >   Index is shown as one of the items (the last one in my case)
 >
 > - Select the Text Index radio button, and type a name
 >   (e.g. "textindex") in the input box, and click the "Add" button
 >
 > - You are now back in the default package display; a new item has
 >   appeared, which is the textindex you just added.
 >
 > - Now we have to create and configure two more services: a local event
 >   service and an object hub.  To create these:
 >
 > - Click on Add...

<snipped detailed explanation about setting up an event service and an 
object hub>

I think we're going to make Zope 3 by default put an Event Service and 
an ObjectHub in the root folder's service manager.

This will save a lot of this boring set-up work.

Events are very important for Zope 3 for such basic things as allowing 
you to record creation times and modification times on content.

 > Interesting detail: when you edit a .py file, you must restart z3.py
 > (which takes forever on Mac OSX).

Apparently that's due to Mac OS X's very slow filesystem. Can anyone 
knowledgeable comment on this?

 > But when you edit a .pt (ZPT) file,
 > all it takes is reloading the page to get the new template.  Makes
 > experimental hacking with ZPT a joy, compared to fixing bugs in the
 > Python code...

When the persistent modules work reliably enough to be merged into the 
Trunk, there will be a cvs-like command to check in and out code from a 
persistent module in a running zope.

 > Jim doesn't want us to use Python expressions in TAL, and SteveA was
 > also skeptical (though delighted to see it work).  A more flexible
 > approach would create a view object in Python whose methods are
 > invoked from the form.  I'll have to learn how to do this tomorrow.
 > [Didn't get to this.]

It isn't at all complex or difficult to do this -- we just didn't get 
around to it during the sprint.

One of the goals of Zope 3 Views is to make it easy to keep the HTML 
presentation separate from the logic required to present content in a 
browser. I think this goal has been achieved already.

 > There was also a problem: the return value of a
 > query() with no results (the only kind I've tried so far, since no
 > events are generated yet) is the Python expression ([], 0) but this is
 > wrapped in security proxies, and somehow applying str() or repr() to
 > that proxy always gives us a string like <security proxy ...> .  So I
 > ended up displaying query(...)[1], which is a "rock" (an immutable
 > object that doesn't get wrapped).

This has been fixed now. The methods __str__ and __repr__ should always 
be available to call on an object. One side effect of this is that you 
should not make security-sensitive information available from calling an 
__str__ or __repr__ method. This will need to go into the Zope 3 
documentation for product developers.

 > I also added an adaptor

They're spelled "adapter" in Zope 3. Calling 'getAdaptor' will get you a 
NameError or an AttributeError. :-)

 > ...from IReadFile to ISearchableText, which
 > extracts the data from the file (only for text/plain files).

 > One necessary step that wasn't listed
 > above: after creating the object hub, in the default package, you need
 > to create a Registration component (the very simple one that Steve
 > implemented has the policy that all objects are registered), and
 > toggle its subscription status to ON in its contgrol menu.  Like the
 > text index, it's not a service so doesn't need to be configured.

Not every object is registered with the ObjectHub. As Guido points out 
later in his diary, you can write components that listen to ObjectEvents 
and implement a policy for which objects get registered. You can 
implement this policy across various such registration components; for 
example, one component might be for registering News Articles within 
/DailyRecord/published_content/..., another might be for registering 
Adverts within /DailyRecord/published_content/paid_for_ads/...

---- Guido's diary part 2 ----

 > Project for Friday morning (once power is restored; power in all of
 > Rotterdam is out from 9am till 1am)

I heard that someone drove a truck into a the structure supporting a 
major power-line.

 > Subproblem one: how to find this content object, given the
 > Registration Utility object itself.  Several suggestions from SteveA:
 > get the location (a tuple of strings) and look for '++etc++Services',
 > or at least an item starting with '++etc++';

After checking with Jim, I can confirm that looking for 
'++etc++Services' is the best thing to do here.

 > SteveA suggested that there should be such an
 > interface, so he can attach service managers to non-folder objects,

A non-folder object that wants to contain a ServiceManager needs to 
implement IServiceManagerContainer.
I imagine that quite a few advanced applications will want to contain 
their own ServiceManager, and will thus be ServiceManagerContainers.

 > In the end we decide to search the
 > location tuple for an item that .startswith('++etc++').

This should be changed to look for an item that is equal to 
'++etc++Services'.

 > A bit about locations: there are several different ways to represent a
 > location, and in general functions/methods that take a location accept
 > them all.  One form is a unicode string containing slashes for
 > separators; a leading slash means an absolute path.  Another form is a
 > sequence of unicode strings that don't contain slashes; a leading
 > empty string means an absolute path.  The root is represented by ['']
 > or ('',); an empty list or tuple is not a path (I guess it means an
 > empty relative path :-).

Look in Zope/App/Traversing/tests/testConvenienceFunctions.py
You'll find the class attributes _good_locations and _bad_locations that 
have the examples of valid and invalid locations used to test the 
functions that convert between formats for locations.

According to this, a list is not a valid location.

The empty tuple is not a valid location. Perhaps it should be. I'd 
welcome discussion and suggestions from people who have read through 
this unit-test.

 > I think there are rules for the characters
 > allowed in path components for content objects, but I don't know what
 > they are; from observation, a name starting with '++etc++' has
 > something do do with services, while a name starting with '@@' is (a
 > shortcut for?) a view.  A name consisting of a single '+' is an 'Add
 > view', which has magic behavior that I haven't quite understood yet; I
 > think it may be a shortcut for a view on a container that can be used
 > to add new items.

Right. So, the following are generally forbidden as names:

   name.startswith('@@')
   name.looksLike('++validname++validname')
   name == '+'

However, if you *need* to be able to use names such as these in a 
container, you can implement a custom container, and a ITraversable for 
your container, that do what you want.

In almost all cases, the restrictions above should be quite acceptable.

 > There are convenience routines to translate locations between both
 > forms, as well as to go between locations and objects,and the
 > ObjectHub has methods to translate between hubids and objects or
 > locations.  Also handy: getPhysicalPath(context) gives the location
 > (as a tuple) of the context (typically an object), and
 > getPhysicalRoot(context) gives the root object.

There needs to be an interface and documentation to describe this 
collection of helpful functions.

 > traverseName(object, name) yields a named subobject (where object
 > must implement IReadContainer I believe, and be wrapped in a context)

Not true. It doesn't matter what the object implements. The object must 
be adaptable to ITraversable. The an ITraversable object describes 
exactly what you do for one traversal step, given an object and a name.
This doesn't depend on any IContainer interfaces. Nor does it depend on 
ContextWrappers. However, in Zope 3, many ITraversable implementations 
do depend on ContextWrappers. There is a default ITraversable for 
containers which serves for most kinds of containers.

 > I don't know what traversing an absolute path does (I
 > suppose it could start from the root

That's what it does.

 > or ignore the '' name).

That's what it doesn't.

 > Traversal is complicated by the '++' and '@@' conventions, and by
 > adapters, and by context wrappers, and by security proxies.  There's a
 > 'traverser' object, which we ended up not using, and an even
 > lower-level 'ITraversable' interface which we also ended up not using
 > (it is used by traverse() and traverseName(), and its API is seriously
 > weird).

There are various details of Traversal that are only used in special 
cases. They are needed, but the are not needed when writing regular 
application code, and the are not needed when writing most services.
The convenience functions available from Zope.App.Traversing are for use 
when you don't want to be concerned with these details.

 > Actually, there are two ContainmentIterator classes: the original, in
 > a module by itself in the Zope.ContextWrapper package, and a
 > proxy-aware one (which you should aways use) in the
 > Zope.Proxy.ContextWrapper module.  Enough already.

This needs refactoring. The history is that two different kinds of proxy 
were developed at different times. (These are ContextWrappers and 
Security proxies.) Although there is a common base class written in C, 
there remains the task of thinking carefully about proxy wrappers as a 
whole, and refactoring the way proxies are handled in Zope 3 to avoid 
the need for the kind of cruft Guido describes above.

 > In the end, we ended up with very simple straightforward code.  To
 > find the folder in whose service manager we are (too bad we can't use
 > Python 2.3's enumerate() builtin):

I find the enumerate() builtin so useful that I often import an 
equivalent function for use in my Python2.2 projects.

   from __future__ import generators

   def enumerate(seq):
       count = 0
       for i in seq:
           yield count, i
           count += 1

 > I suppose a generalized RU
 > could be written that takes policy specifications as arguments,
 > although SteveA was surprisingly lukewarm about this idea

I think it would be easier to make a Registration Utility base-class, 
and override methods that contain the implementation of a specific policy.

 > (whereas
 > otherwise he's always the first to propose the most general mechanism
 > imaginable, based on marker interfaces and adapters, and refactoring
 > some code to make it more general, if at all feasible :-).  (Sorry
 > Steve.  :-)

Actually, this was one of the funniest moments of the sprint for me -- 
suddenly Guido and I found ourselves arguing the opposite positions from 
where we'd been on a similar issue just 45 minutes before.
Sprints are fun like that.

 > Note that an important part of the policy may have to do
 > with workflow: e.g.  an object may only be indexed when it becomes
 > published.  For the editors' use, another textindex for unpublished
 > objects could be conceivable.  I guess this would have to use a
 > different object hub instance, since the textindex effectively listen
 > to object hub (un)registration events.

That could be the same object hub instance.
This is where you'd use an event channel between the object hub and your 
text index. The event channel would filter events based on this policy.
If anyone has particular use-cases for this kind of thing, I'd be happy 
to discuss ways to achieve them.

 > The try/except is needed because we may be registering objects that
 > were already registered.  The object hub currently raises an exception
 > in this case.  At the sprint we thought it would be a good idea to
 > change this to to returning True/False about a duplicate object, to
 > distinguish it from more severe errors; but actually, ObjectHubError
 > is only raised for duplicate or unknown locations; other errors use
 > different exceptions.  So this proposed refactoring is not needed, and
 > catching the exception is exactly right.

See the discussion on the zope3-dev mailing list for further details.

 > Hooking this all up to the UI is trivial, as long as you don't mind
 > that pilot errors cause exceptions, e.g.  this will raise an exception
 > if there's no ObjectHub.

Right. The ObjectHub service is a service. The contract of an 
application with a service says that an application can depend on that 
service being available. It doesn't need to check that the service 
exists, and it doesn't need to fail gracefully if it turns out not to 
exist. It is morally equivalent to an application depending on a 
particular module being installed.
For example, if I remove the smtplib module from my system's Python 
libraries, I can expect Mailman to fail ungracefully.

 > A word about 'wrapped_self'.  By convention (not sure if this is done
 > consistently) a 'context method' (see diary part 1) uses wrapped_self
 > instead of self.

It is done consistently in that, when I see a ContextMethod that has 
"self" rather than "wrapped_self", I change it to "wrapped_self", run 
the tests and check in the change.
This is one of the areas that requires thought for implementing Context 
Classes. (See earlier.)

 > In part 1 of this diary I complained about this,
 > because I kept making mistakes (writing 'self').  I brought it up with
 > SteveA, who was adamant that the convention is important for context
 > methods.  He pointed out that (a) many methods end up doing something
 > like 'self = unwrap(wrapped_self)' and (b) I was writing code that
 > runs in the service area (even though it's not a service) and hence is
 > pretty atypical; he claims that content code rarely needs to declare
 > context methods.

Actually, a lot of code people wrote at the Sprintathon is atypical in a 
similar way. Writing services is much less straightforward than writing 
applications.

I'm a bit concerned that some people have come away from the Sprintathon 
thinking that Zope 3 is really tricky and complex.
Zope 3 actually is tricky and complex when you look at the tricky and 
complex parts. These parts are there to support writing applications 
that based on Zope 3. Writing these applications should be straightforward.

Perhaps a sprint focused on writing Products and Applications rather 
than Zope 3 Plumbing would be a good thing.

 > I'm willing to accept this and will try to get

In my email client, Guido's diary ends here. Is there more?

--
Steve Alexander