Hello,
we recently realised mimetype assignment in Zope to e.g. Zope File objects is inconsistent and can vary when different clients (browsers) upload files with the same file extensions.
Example: when a file called "foobar.rtf" is upload to a Zope File object from Linux Firefox, the mimetype assigned is (can be) 'application/rtf'. However, the same file uploaded to the same Zope File object in the same Zope instance, using IE on Window2000 (with MS Office installed) will get 'application/msword' assigned.
The mimetype assignment for uploaded files is done in OFS.Image.py (maybe there're more places or other Products that do this - I know that at least ExtFile does this too). line 463 of OFS.Image.py, Zope 2.7.2:
def _get_content_type(self, file, body, id, content_type=None): headers=getattr(file, 'headers', None) if headers and headers.has_key('content-type'): content_type=headers['content-type'] else: if type(body) is not type(''): body=body.data content_type, enc=guess_content_type( getattr(file, 'filename',id), body, content_type) return content_type
Then I understood that the headers as sent by the client for this file (may?) have a content-type entry that takes precedence over both the mimetypes 'database' and the content_type passed in as an argument.
We could deal with the inconsistent assignment on the application level (in this case Silva), but I'd rather consider changing this behaviour on the Zope level. I could imagine changing the way a mimetype is 'guessed' from an uploaded File to something like:
def _get_content_type(self, file, body, id, content_type=None): """ Order of precedence: 1) see if guess_content_type resolves to a mimetype for the filename 2) if not use content_type as sent in the headers if available 3) else use argument passed in """ headers = getattr(file, 'headers', {}) content_type = headers.get('content-type', content_type) if type(body) is not type(''): body = body.data name = getattr(file, 'filename', id) content_type, enc = guess_content_type(name, body, content_type) return content_type
Does anyone have an opinion on this? Is the current behaviour completely intentional, maybe even according to some specification (and thus I should deal with it on the application level)? Should I file a collector issue?
regards jw
--On Donnerstag, 30. September 2004 9:36 Uhr +0200 Jan-Wijbrand Kolman jw@infrae.com wrote:
def _get_content_type(self, file, body, id, content_type=None): """ Order of precedence: 1) see if guess_content_type resolves to a mimetype for the filename 2) if not use content_type as sent in the headers if available 3) else use argument passed in """ headers = getattr(file, 'headers', {}) content_type = headers.get('content-type', content_type) if type(body) is not type(''): body = body.data name = getattr(file, 'filename', id) content_type, enc = guess_content_type(name, body, content_type) return content_type
Does anyone have an opinion on this? Is the current behaviour completely intentional, maybe even according to some specification (and thus I should deal with it on the application level)? Should I file a collector issue?
Looks like a reasonable solution. If it works be can include the changes for Zope 2.8 (maybe Zope 2.7.4).
Andreas
Jan-Wijbrand Kolman wrote:
Hello,
we recently realised mimetype assignment in Zope to e.g. Zope File objects is inconsistent and can vary when different clients (browsers) upload files with the same file extensions.
Example: when a file called "foobar.rtf" is upload to a Zope File object from Linux Firefox, the mimetype assigned is (can be) 'application/rtf'. However, the same file uploaded to the same Zope File object in the same Zope instance, using IE on Window2000 (with MS Office installed) will get 'application/msword' assigned.
The mimetype assignment for uploaded files is done in OFS.Image.py (maybe there're more places or other Products that do this - I know that at least ExtFile does this too). line 463 of OFS.Image.py, Zope 2.7.2:
def _get_content_type(self, file, body, id, content_type=None): headers=getattr(file, 'headers', None) if headers and headers.has_key('content-type'): content_type=headers['content-type'] else: if type(body) is not type(''): body=body.data content_type, enc=guess_content_type( getattr(file, 'filename',id), body, content_type) return content_type
Then I understood that the headers as sent by the client for this file (may?) have a content-type entry that takes precedence over both the mimetypes 'database' and the content_type passed in as an argument.
We could deal with the inconsistent assignment on the application level (in this case Silva), but I'd rather consider changing this behaviour on the Zope level. I could imagine changing the way a mimetype is 'guessed' from an uploaded File to something like:
def _get_content_type(self, file, body, id, content_type=None): """ Order of precedence: 1) see if guess_content_type resolves to a mimetype for the filename 2) if not use content_type as sent in the headers if available 3) else use argument passed in """ headers = getattr(file, 'headers', {}) content_type = headers.get('content-type', content_type) if type(body) is not type(''): body = body.data name = getattr(file, 'filename', id) content_type, enc = guess_content_type(name, body, content_type) return content_type
Does anyone have an opinion on this? Is the current behaviour completely intentional, maybe even according to some specification (and thus I should deal with it on the application level)? Should I file a collector issue?
-1 for using the "guessed" value over the one from the headers; +1 for using the argument over the guessed value (so that the application can "fix" the problem). I agree that having different clients supply different types is painful, but I don't think that "fixing" it at the low level is reasonable (mechanism vs. policy).
In summary, I would prefer the precedence to be:
1. Passed value
2. Request header
3. Guessed value
Tres.
Tres Seaver wrote:
-1 for using the "guessed" value over the one from the headers; +1 for using the argument over the guessed value (so that the application can "fix" the problem). I agree that having different clients supply different types is painful, but I don't think that "fixing" it at the low level is reasonable (mechanism vs. policy).
Can you elaborate a bit more on the "mechanism vs. policy" remark? I'm not sure I understand your line or reasoning and I'm curious for it :)
regards, jw
Jan-Wijbrand Kolman wrote:
-1 for using the "guessed" value over the one from the headers; +1 for using the argument over the guessed value (so that the application can "fix" the problem). I agree that having different clients supply different types is painful, but I don't think that "fixing" it at the low level is reasonable (mechanism vs. policy).
Can you elaborate a bit more on the "mechanism vs. policy" remark? I'm not sure I understand your line or reasoning and I'm curious for it :)
OK. The OFS.Image code we are talking about is fairly low-level, and is used widely across applications. It should thus be as "policy-free" as possible, so that applications which need different policies can still reuse the mechanism. Thus, it should *never* override the values passed explicitly by the application (which are themselves "policy").
Overriding the header value passed by the client itself with one guessed by mimelib is *also* a policy, and one which the application could / should make; otherwise, we end up with the alternate problem, which is that clients which do the Right Thing get stomped on.
"Guessing" should always be last in line, and used (at least by default) only in the absence of explicit information.
Tres.