New subject: [Zope-dev] inconsistent mimetype assignment for uploaded files

30 Sep 2004


      Hello,
we recently realised mimetype assignment in Zope to e.g. Zope File
objects is inconsistent and can vary when different clients (browsers)
upload files with the same file extensions.
Example: when a file called "foobar.rtf" is upload to a Zope File
object from Linux Firefox, the mimetype assigned is (can be)
'application/rtf'. However, the same file uploaded to the same Zope
File object in the same Zope instance, using IE on Window2000 (with MS
Office installed) will get 'application/msword' assigned.
The mimetype assignment for uploaded files is done in OFS.Image.py
(maybe there're more places or other Products that do this - I know
that at least ExtFile does this too). line 463 of OFS.Image.py, Zope
2.7.2:
def _get_content_type(self, file, body, id, content_type=None):
     headers=getattr(file, 'headers', None)
     if headers and headers.has_key('content-type'):
         content_type=headers['content-type']
     else:
         if type(body) is not type(''): body=body.data
         content_type, enc=guess_content_type(
             getattr(file, 'filename',id), body, content_type)
     return content_type
Then I understood that the headers as sent by the client for this file
(may?) have a content-type entry that takes precedence over both the
mimetypes 'database' and the content_type passed in as an argument.
We could deal with the inconsistent assignment on the application
level (in this case Silva), but I'd rather consider changing this
behaviour on the Zope level. I could imagine changing the way a
mimetype is 'guessed' from an uploaded File to something like:
def _get_content_type(self, file, body, id, content_type=None):
     """
     Order of precedence:
     1) see if guess_content_type resolves to a mimetype for the
        filename
     2) if not use content_type as sent in the headers if
        available
     3) else use argument passed in
     """
     headers = getattr(file, 'headers', {})
     content_type = headers.get('content-type', content_type)
     if type(body) is not type(''):
         body = body.data
     name = getattr(file, 'filename', id)
     content_type, enc = guess_content_type(name, body, content_type)
     return content_type
Does anyone have an opinion on this? Is the current behaviour
completely intentional, maybe even according to some specification
(and thus I should deal with it on the application level)? Should I
file a collector issue?
regards
jw
-- 
Jan-Wijbrand Kolman
jw@infrae.com