[ZCM] [ZC] 1685/ 3 Comment "Potential bug in Python cgi.FieldStorage can lead to problematic memory growth"

Collector: Zope Bugs, Features, and Patches ... zope-coders-admin at zope.org
Sun Jan 30 08:45:31 EST 2005


Issue #1685 Update (Comment) "Potential bug in Python cgi.FieldStorage can lead to problematic memory growth"
 Status Accepted, Zope/bug critical
To followup, visit:
  http://zope.org/Collectors/Zope/1685

==============================================================
= Comment - Entry #3 by mcdonc on Jan 30, 2005 8:45 am

A fix for this bug has been submitted to the Python SF bug tracker as http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470

Does anyone have an opinion on how (if?) we should fix this for Zope while this patch is being reviewed and before a new Python version is released?  The risk of not fixing it is that it is possible for people who have access to make POST requests to Zope to consume arbitrary amounts of memory on the server upon which it is hosted.  I haven't yet determined whether the "cgi-maxlen" zope.conf entry is helpful for mitigating this risk.
________________________________________
= Comment - Entry #2 by mcdonc on Jan 30, 2005 7:36 am


> The first problem is that if the "content-length" of any part of the
>   multipart input is not provided, FieldStorage appears to default to using
>   a StringIO object to hold the output for the parsing of an individual
>   part.  As far as I can tell, individual elements of a multipart/form-data
>   input stream are not required to be decorated with their lengths within
>   an enclosed header ( see http://www.faqs.org/rfcs/rfc1867.html). 
>   Although they are not prevented from doing so, but they are not required
>   to do so by the RFC.  This seems to mean that every file upload from
>   clients that do not supply a Content-Length header for the individual
>   elements of a multipart input will end up as StringIO objects, and thus
>   in RAM.  That said, I've only tested with Firefox, but it does not supply
>   these headers for individual parts of the multipart message.  In
>   practice, this appears to mean that all files uploaded via the
>   multipart/form-data HTTP POST protocol will end up entirely in RAM, at
>   least when uploaded by some class of clients, including one of the two
>   major browsers.

This analysis is false.  FieldStorage only uses a StringIO to store data up to 1000 characters, and thereafter switches to a tempfile.

The analysis of the "newline" problem is still true.
________________________________________
= Request - Entry #1 by mcdonc on Jan 30, 2005 2:42 am

 Status: Pending => Accepted

 Supporters added: chrism

This is actually a potential Python bug but I'm submitting it here because it's likely we'll need to fix it within Zope for some period of time either by including our own cgi module or by monkeypatching.

When a POST request is used to submit a multipart/form-data form which includes an HTML "file" input element, the resulting multipart input is parsed using Python's cgi.FieldStorage class.

There appear to be two problems with cgi.FieldStorage, both which can manifest themselves as "memory hogs".

The first problem is that if the "content-length" of any part of the multipart input is not provided, FieldStorage appears to default to using a StringIO object to hold the output for the parsing of an individual part.  As far as I can tell, individual elements of a multipart/form-data input stream are not required to be decorated with their lengths within an enclosed header ( see http://www.faqs.org/rfcs/rfc1867.html).  Although they are not prevented from doing so, but they are not required to do so by the RFC.  This seems to mean that every file upload from clients that do not supply a Content-Length header for the individual elements of a multipart input will end up as StringIO objects, and thus in RAM.  That said, I've only tested with Firefox, but it does not supply these headers for individual parts of the multipart message.  In practice, this appears to mean that all files uploaded via the multipart/form-data HTTP POST protocol will end up entirely in RAM, at least when uploaded by some class of clients, including one of the two major browsers.

This could be fixed by not using a StringIO object at all, but instead by always using a tempfile.  This is a heavy handed fix and may be a "speed killer" so it might be better to find another solution that does some heuristic based on the overall length of the input stream.

Another less-commonly-replicated problem in FieldStorage can cause the entire contents of an uploaded file to be read into RAM.

When the uploaded file does not contain any newlines (individual parts of multipart input are not required to contain newlines), the entirety of that part will be read into RAM, as FieldStorage uses "readline()" to attempt to read a chunk of a file at a time.  This appears to be a problem within the methods "read_lines_to_eof", "read_lines_to_outerboundary", and "skip_lines".  Even if the RFC that defines "multipart/*" encodings required that "well-behaved" content was required to be encoded in such a way that it contained newlines every reasonable number of characters, I'm not sure that it's reasonable to have a strategy that depends on file parts ending properly in newlines when the input comes from arbitrary sources and thus may be intentionally "hostile", so the practice of using "readline()" within these methods should probably cease (or maybe the methods should go away altogether).

I am in the process of attempting to fix this.
==============================================================



More information about the Zope-Collector-Monitor mailing list