[Zope3-dev] i18n, unicode, and string encoding

Guido van Rossum guido@python.org
Mon, 14 Apr 2003 15:49:42 -0400


> Guido van Rossum wrote:
> 
> >>I think  the string object should store its encoding  in an attribute. 
> >>Why ? Given a string I would like to know what its encoding is . How can 
> >>I do that now ?
> >>    
> >>
> >
> >You could subclass the str type, or you can create a class that
> >contains a string and an encoding name (maybe subclassing UserString).
> >
> >But I challenge you and ask, why do you want to know its encoding?
> >You shouldn't be carrying around encoded strings.  Instead, you should
> >decode strings into Unicode.
> >  
> >
> Hello Guido,
> First of all I think I should ask myself If I want to take up  
> challenges with you :)
> anyway  I am going to let loose in the hope that  I will learn somthing new.
> I can think of a byte array coming of a socket containing EBCIDIC  for 
> instance . I want to store this in a string type and hand it off
> to  a translation function that can translate from  EBCIDIC  to  ASCII 
>  .storing the encoding in the string as an attribute will help the 
> translation function. At this point I am thinking why should I pass this 
> as a parameter to the translation function rather than store it as an 
> attribute of the string object itself. The translation function knows it 
> has to produce US-ASCII, so it looks at the encoding attribute of the 
> input string and  figures out what to do from there.

Yes, that's the right solution.  Treat it as a hot potato: decode the
EBCDIC as soon as you can.

--Guido van Rossum (home page: http://www.python.org/~guido/)