[Zope3-dev] RFC: Make HTTP streaming of large data simpler

Tue Dec 6 09:26:53 EST 2005

Philipp von Weitershausen wrote:
> Jim Fulton wrote:
> 
>>There are a number of reasons we needed IResult:
>>
>>- We want to be able to adapt existing output, especially
>>   string output and we needed an interface to adapt to.
> 
> 
> I see. I presume this is for the reason you state below, namely to be able to customize
> the setting of the Content Type, etc. I guess that's valid use case, but it seems like a
> separate issue from streaming or trickling data.  Maybe we should make them separate
> interfaces: The IOutputHeaders adapter would be responsible for figuring out output
> headers on a result object (incl. a string), the IBodyIterator adapter would be
> responsible for streaming/trickling.

We could do it that way.  I'm not conviced that it is significantly better.
Certainly, it's not enough better in my opinion to change it at this late date.

>>- An adapter may need to affect outut headers, so IResult
>>   needed to provide header data.
> 
> 
> Well, the view, before falling into iteration, could set response headers itself.

Yes, it could, but we often done't want the view to be bothered with that.

> 
>>- We needed iterable data for WSGI.
> 
> 
> I don't understand how my example of a generator view fails there. It *does* provide
> iterable data to the publishing framework. In fact, the generator itself is the iterable
> data.

You figured this out below.

> 
>>There are two interesting use cases that would drive
>>applications to pay attention to IResult:
>>
>>A. Returning large amounts of data
>>
>>B. Dribbling data from the application, for example
>>    to provide progress on a long-running application.
>>
>>For A, you want to compute that data and then leave
>>application code.  You don't want to stay in the
>>application, holding application resources, like database
>>connections, while the data is being consumed.  In this case,
>>you generally want to create a temporary file and return that
>>as the IResult body.
> 
> 
> Ah, yes, good point. So, while IResult seems to be needed for the decoupling of
> application space and server space, I still think the interface itself is too
> complicated. Instead of requiring this 'body' attribute which is iterable, IResult itself
> should be iterable. I propose to change it to:
> 
>   class IResult(Interface):
>       ...
> 
>       headers = Attribute('A sequence of tuples of result headers, such as'
>                           '"Content-Type" and "Content-Length", etc.')
>       def __iter__(self):
>           """Provide the body data of the response"""

Do you really think this makes implementations any simpler?

Note that things like strings and temprary files are already iterable.
So satisfying the current interface often mearly requires setting an
attribute.

> Or, if we adopt my suggestion of separating headers from body iterators, we'd have two
> interfaces:
> 
>   class IOutputHeaders(IReadMapping):
>       """Provide headers for the response output"""
> 
>   class IBodyIterator(Interface):
>       """Provide the response body in an iterable manner"""
> 
>       def __iter__(self):
>           """Provide the body data of the response"""
> 
> Implementations of IBodyIterator that would create temporary files like you suggest could
> then easily implement __iter__ by returning iter(file_handle_of_the_tempfile).

I don't agree that this is simpler.

> 
>>BTW, your implementation also doesn't work because it doesn't
>>set the content length.
> 
> 
> I don't think setting content length is mandatory. It's definitely nice, though,
> especially for the usability of the app.

Hm, I though the spec said it was required, but it look like you are right.

> 
>>Unfortunately, we still aren't addressing use base B above.
>>Some more API enhancements will be needed to address that.
>>There will need to be some way to signal that the publisher
>>should not release applicatuon resources (not call
>>publication.endRequest and request.close) until after the data
>>has been streamed.  In any case, this needs more thought and
>>a proposal before we attack this.
> 
> 
> Indeed. Also, I still haven't given up my implementation for case B, but of course I'm not
> attached to it. My goal is to have a *simple* way of writing views that A) stream large
> data (I guess the indirection of a temporary file masked by IResult/IBodyIterator is
> needed here) and B) trickle data to the client.

Why does this need to be simple.  I'd even argue for making it harder.
Applications that trickle data back to the client tie up valuable resources
and better have a darn good reason for doing so.

> I would presume for B) we could still also use IResult/IBodyIterator by writing something
> like this (assuming my suggestion of making IResult objects iterable from above):
> 
>   class StreamingView(BrowserView):
>       implements(IBodyIterator)
> 
>       def __iter__(self):
>           return self
> 
>      def next(self):
>          data = self.context.getMoreDataToTrickle()
>          if not data:
>              raise StopIteration
>          return data
> 
> This would be sufficiently simple I think, but a simple generator like my original
> suggestion or the one below would still be more straight-forward.

I don't understand this example.  Are you saying that this would be published?
or that this would be returned by something that is published?

> 
>>We'll need a way to inspect the output to determine which
>>strategy is being used.  An interface seems to be a good
>>way to do this.
> 
> 
> Yes.
> 
> 
>>I think that either of these use cases is advanced and
>>should be handled explcitly.
> 
> 
> Sure. So what would be wrong with:
> 
>    class TrickleView(BrowserView):
>        # tell the publisher that we'll be trickling data the client slowly
>        # so that all resources stay available for that period of time
>        implements(IWantToStayInApplicationSpacePlease)
> 
>        def __call__(self):
>            yield self.context.getDataToTrickle()
>            yield self.context.getMoreDataToTrickle()
>            yield self.context.getEvenMoreDataToTrickle()

This would require the publisher to inspect the published object
and adjust it's behavior based on the published object rather than
the result.  I'd rather branch pof the result.

> 
>>Yet another use case was to make pluggable the traditional
>>implicit determination of output content type and text
>>encoding.  The adaptation to IResult allows this to be
>>customized.
> 
> 
> Like I said, this seems like a separate issue from streaming/trickling data to the client.

It largely is. After all, the IResult framework doesn't deal with tricking
data to the client either.

My point was mainly that we need an interface to adapt to that provides the
data we need.  IResult is a pretty simple interface that is easy to implement
and provides control over both the headers and the body.  I don't really
see value in separating these concerns, especially since, often, they will
be dealt with together.  I especially don't see a justification to try to
deal with this issue *now* in the middle of a release.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org