[Zope3-dev] RFC: Make HTTP streaming of large data simpler

Mon Dec 5 22:31:49 EST 2005

Jim Fulton wrote:
> There are a number of reasons we needed IResult:
>
> - We want to be able to adapt existing output, especially
>    string output and we needed an interface to adapt to.

I see. I presume this is for the reason you state below, namely to be able to customize
the setting of the Content Type, etc. I guess that's valid use case, but it seems like a
separate issue from streaming or trickling data. Maybe we should make them separate
interfaces: The IOutputHeaders adapter would be responsible for figuring out output
headers on a result object (incl. a string), the IBodyIterator adapter would be
responsible for streaming/trickling.

> - An adapter may need to affect outut headers, so IResult
>    needed to provide header data.

Well, the view, before falling into iteration, could set response headers itself.

> - We needed iterable data for WSGI.

I don't understand how my example of a generator view fails there. It *does* provide
iterable data to the publishing framework. In fact, the generator itself is the iterable
data.

> There are two interesting use cases that would drive
> applications to pay attention to IResult:
>
> A. Returning large amounts of data
>
> B. Dribbling data from the application, for example
>     to provide progress on a long-running application.
>
> For A, you want to compute that data and then leave
> application code.  You don't want to stay in the
> application, holding application resources, like database
> connections, while the data is being consumed.  In this case,
> you generally want to create a temporary file and return that
> as the IResult body.

Ah, yes, good point. So, while IResult seems to be needed for the decoupling of
application space and server space, I still think the interface itself is too
complicated. Instead of requiring this 'body' attribute which is iterable, IResult itself
should be iterable. I propose to change it to:

  class IResult(Interface):
      ...

      headers = Attribute('A sequence of tuples of result headers, such as'
                          '"Content-Type" and "Content-Length", etc.')
      def __iter__(self):
          """Provide the body data of the response"""

Or, if we adopt my suggestion of separating headers from body iterators, we'd have two
interfaces:

  class IOutputHeaders(IReadMapping):
      """Provide headers for the response output"""

  class IBodyIterator(Interface):
      """Provide the response body in an iterable manner"""

      def __iter__(self):
          """Provide the body data of the response"""

Implementations of IBodyIterator that would create temporary files like you suggest could
then easily implement __iter__ by returning iter(file_handle_of_the_tempfile).

> BTW, your implementation also doesn't work because it doesn't
> set the content length.

I don't think setting content length is mandatory. It's definitely nice, though,
especially for the usability of the app.

> Unfortunately, we still aren't addressing use base B above.
> Some more API enhancements will be needed to address that.
> There will need to be some way to signal that the publisher
> should not release applicatuon resources (not call
> publication.endRequest and request.close) until after the data
> has been streamed.  In any case, this needs more thought and
> a proposal before we attack this.

Indeed. Also, I still haven't given up my implementation for case B, but of course I'm not
attached to it. My goal is to have a *simple* way of writing views that A) stream large
data (I guess the indirection of a temporary file masked by IResult/IBodyIterator is
needed here) and B) trickle data to the client.

I would presume for B) we could still also use IResult/IBodyIterator by writing something
like this (assuming my suggestion of making IResult objects iterable from above):

  class StreamingView(BrowserView):
      implements(IBodyIterator)

      def __iter__(self):
          return self

     def next(self):
         data = self.context.getMoreDataToTrickle()
         if not data:
             raise StopIteration
         return data

This would be sufficiently simple I think, but a simple generator like my original
suggestion or the one below would still be more straight-forward.

> We'll need a way to inspect the output to determine which
> strategy is being used.  An interface seems to be a good
> way to do this.

Yes.

> I think that either of these use cases is advanced and
> should be handled explcitly.

Sure. So what would be wrong with:

   class TrickleView(BrowserView):
       # tell the publisher that we'll be trickling data the client slowly
       # so that all resources stay available for that period of time
       implements(IWantToStayInApplicationSpacePlease)

       def __call__(self):
           yield self.context.getDataToTrickle()
           yield self.context.getMoreDataToTrickle()
           yield self.context.getEvenMoreDataToTrickle()

> Yet another use case was to make pluggable the traditional
> implicit determination of output content type and text
> encoding.  The adaptation to IResult allows this to be
> customized.

Like I said, this seems like a separate issue from streaming/trickling data to the client.

Philipp

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.