[Zope3-dev] i18n, unicode, and the underline

Barry Warsaw barry@python.org
10 Apr 2003 18:10:52 -0400


On Thu, 2003-04-10 at 17:58, Gary Poster wrote:

> If I remember correctly, though, we're supposed to be able to regex 
> through the actual Python files to extract the display text from the 
> files.  The compilation of this text is the file that is supposed to be 
> submitted to a professional translator for an application.  The "_(" 
> pattern is supposed to be the hook that makes the regex reliable-ish.

Actually, pygettext is based on tokenize, so it's not really a regexp
match.  That means that it doesn't matter what flavor of unicode, raw,
or normal string is inside the parens -- tokenize will just see it as a
STRING token.  The "pattern" being matched is 

- a function call where the function's name is "_"
- a single argument inside the function call
- the argument is a STRING

> Could we instead have some zcml that makes the ZopeMessageIdFactory base 
> class keep track of the text that it is given as the code loads, 
> instead, to be spit out as a file or somesuch?  Or is that impractical 
> for another reason?  If we could do this, we could have some helpers, 
> like the Field code.
> 
> Or is this too much DWIM ("Do what I mean")?

I don't think you want to import the source code to extract the
translatable strings.  There can be side effects and dependencies on
imports that can cause you headaches.  A separate textual scanning
process is probably going to be more robust in the long term.

-Barry