[Zope] automagic bome header at start of utf16 content?

Jürgen Herrmann Juergen.Herrmann at XLhost.de
Thu Jan 8 05:28:57 EST 2009


On Thu, January 8, 2009 11:04, Andreas Jung wrote:
> On 08.01.2009 10:33 Uhr, Jürgen Herrmann wrote:
>>  i already sent the request directly to the zope server
>> omitting our apache proxy and monitored traffic with wireshark. the
>> com header comes from zope. i did not find anything in zope's code
>> that heuristically finds out this is utf16 content and prepends the
>> BOM header. so i'm a bit confused where zope takes it's wisdom from :)
>> anybody?
>
> I can not remember having seen any kind of code with the Zope core
> setting the BOM. We have code in the pagetemplate implementation
> interpreting a BOM but I have doubt that Zope sends a BOM out by itself
> (especially not for utf-16).
>
> Andreas

i wrote a small python script to check this out, content:
request = container.REQUEST
RESPONSE =  request.RESPONSE
RESPONSE.setHeader('Content-Type', 'x-bom-test')
RESPONSE.setHeader('Content-Disposition', 'attachment; filename=bom_test.dat')
ustring = u'sgh sdgh\ns\xf6\xe4\xe4gddp\xe4s\n\u8a0a\u4ee5\u53ca\u76f8\u95dc\u7db2\u7d61\u670d\u52d9'
return ustring.encode('utf16')

here's what wireshark captured:
0000  00 16 17 1e 26 c6 00 1d  09 b8 cf cb 08 00 45 00   ....&... ......E.
0010  01 46 93 51 40 00 40 06  21 f6 c0 a8 01 79 c0 a8   .F.Q at .@. !....y..
0020  01 a1 1f 91 0d 2f a9 b9  5b 33 55 8e 5f 1f 50 18   ...../.. [3U._.P.
0030  1d 50 d1 0b 00 00 48 54  54 50 2f 31 2e 31 20 32   .P....HT TP/1.1 2
0040  30 30 20 4f 4b 0d 0a 53  65 72 76 65 72 3a 20 5a   00 OK..S erver: Z
0050  6f 70 65 2f 28 5a 6f 70  65 20 32 2e 31 30 2e 35   ope/(Zop e 2.10.5
0060  2d 66 69 6e 61 6c 2c 20  70 79 74 68 6f 6e 20 32   -final,  python 2
0070  2e 34 2e 34 2c 20 6c 69  6e 75 78 32 29 20 5a 53   .4.4, li nux2) ZS
0080  65 72 76 65 72 2f 31 2e  31 0d 0a 44 61 74 65 3a   erver/1. 1..Date:
0090  20 54 68 75 2c 20 30 38  20 4a 61 6e 20 32 30 30    Thu, 08  Jan 200
00a0  39 20 31 30 3a 32 30 3a  34 35 20 47 4d 54 0d 0a   9 10:20: 45 GMT..
00b0  43 6f 6e 74 65 6e 74 2d  4c 65 6e 67 74 68 3a 20   Content- Length:
00c0  36 30 0d 0a 43 6f 6e 74  65 6e 74 2d 54 79 70 65   60..Cont ent-Type
00d0  3a 20 78 2d 62 6f 6d 2d  74 65 73 74 0d 0a 43 6f   : x-bom- test..Co
00e0  6e 74 65 6e 74 2d 44 69  73 70 6f 73 69 74 69 6f   ntent-Di spositio
00f0  6e 3a 20 61 74 74 61 63  68 6d 65 6e 74 3b 20 66   n: attac hment; f
0100  69 6c 65 6e 61 6d 65 3d  62 6f 6d 5f 74 65 73 74   ilename= bom_test
0110  2e 64 61 74 0d 0a 0d 0a  ff fe 73 00 67 00 68 00   .dat.... ..s.g.h.
0120  20 00 73 00 64 00 67 00  68 00 0a 00 73 00 f6 00    .s.d.g. h...s...
0130  e4 00 e4 00 67 00 64 00  64 00 70 00 e4 00 73 00   ....g.d. d.p...s.
0140  0a 00 0a 8a e5 4e ca 53  f8 76 dc 95 b2 7d 61 7d   .....N.S .v...}a}
0150  0d 67 d9 52                                        .g.R

look at offset 0x0119...

ok, time to look at repr(ustring.encode('utf16')):
'\xff\xfes\x00g\x00h\x00 \x00s\x00d\x00g\x00h\x00\n\x00s\x00\xf6\x00\xe4\x00'\
'\xe4\x00g\x00d\x00d\x00p\x00\xe4\x00s\x00\n\x00\n\x8a\xe5N\xcaS\xf8v\xdc\x95'\
'\xb2}a}\rg\xd9R'

bam!
i din't exepct that encoding in utf16 would add a bom header by itself...

sorry for posting so lenghty, thought that it might be interesting for
people having to deal with utf16...

best regards,
jürgen herrmann
--
>> XLhost.de - eXperts in Linux hosting ® <<

XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Volker Geith, Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)700 XLHOSTDE [0700 95467833]
Fax:  +49 (0)700 XLHOSTDE [0700 95467833]

WEB:  http://www.XLhost.de
IRC:  #XLhost at irc.quakenet.org



More information about the Zope mailing list