[Zope3-checkins] CVS: Zope3/src/zope/i18n - format.py:1.1 locales.py:1.1 locales.pyc:1.1 interfaces.py:1.2

Stephan Richter srichter@cbu.edu
Sun, 5 Jan 2003 15:20:41 -0500


Update of /cvs-repository/Zope3/src/zope/i18n
In directory cvs.zope.org:/tmp/cvs-serv28404/i18n

Modified Files:
	interfaces.py 
Added Files:
	format.py locales.py locales.pyc 
Log Message:
I am proud to check in this code, which implements a modified version of
the I18nFormatLocaleSupport proposal.

This codes adds real hardcore Locale support to Zope, including number 
and date/time parsing/formatting using patterns that are compliant to the 
emerging ICU standard. 
Note: The formatting code can be used independent of the Locale.

This standard compliance allowed me to use the ICU Locale XML Files as 
data source, so that we basically have locale support now for virtually
any locale that ICU supports (therefore all the XML files in the checkin).

I am surprised by the speed the minidom Python library parses the XML
(maybe someone can write a cleaner XML file parser). Also I think the 
two pattern parsers, parsing from text to objects and formatting objects 
is very fast, so it is definitely suitable for frequent usage.

There are some tasks left however:

- Integrate this new code into the TranslationService mechanism. Well, I
  would actually propose to review all of the old code and adapt it more 
  to the way the new Locales work. I really do not like the gettext 
  standard too much, and ICU has also a lot of commercial support.

- Implement currency parsing and formatting. While this is a smaller task
  it still needs to be done.

- As mentioned above, maybe implement a new XML File parser and replacing 
  the minidom course.

Ah yeah, the tests contain a multitude of examples on how it all works.


=== Added File Zope3/src/zope/i18n/format.py === (646/746 lines abridged)
##############################################################################
#
# Copyright (c) 2002, 2003 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
##############################################################################
"""Basic Object Formatting

This module implements basic object formatting functionality, such as
date/time, number and money formatting.

$Id: format.py,v 1.1 2003/01/05 20:19:37 srichter Exp $
"""
import re
import math
import datetime
import calendar
from zope.i18n.interfaces import IDateTimeFormat, INumberFormat

class DateTimeParseError(Exception):
    """Error is raised when parsing of datetime failed."""

class DateTimeFormat:
    __doc__ = IDateTimeFormat.__doc__

    __implements__ =  IDateTimeFormat

    _DATETIMECHARS = "aGyMdEDFwWhHmsSkKz"

    def __init__(self, pattern=None, calendar=None):
        if calendar is not None:
            self.calendar = calendar
        self._pattern = pattern
        self._bin_pattern = None
        if self._pattern is not None:
            self._bin_pattern = parseDateTimePattern(self._pattern,
                                                     self._DATETIMECHARS)

    def setPattern(self, pattern):
        "See zope.i18n.interfaces.IFormat"
        self._pattern = pattern
        self._bin_pattern = parseDateTimePattern(self._pattern,
                                                 self._DATETIMECHARS)

[-=- -=- -=- 646 lines omitted -=- -=- -=-]

                suffix += char
                state = READ_SUFFIX

        elif state == READ_PADDING_3:
            padding_3 = char
            state = READ_SUFFIX

        elif state == READ_SUFFIX:
            if char == "*":
                state = READ_PADDING_4
            elif char == "'":
                state = READ_SUFFIX_STRING
            elif char == ";":
                state = READ_NEG_SUBPATTERN
            else:
                suffix += char
                
        elif state == READ_SUFFIX_STRING:
            if char == "'":
                state = READ_SUFFIX
            else:
                suffix += char

        elif state == READ_PADDING_4:
            if char == ';':
                state = READ_NEG_SUBPATTERN
            else:
                padding_4 = char

        elif state == READ_NEG_SUBPATTERN:
            neg_pattern = parseNumberPattern(pattern[pos:])[0]
            break

    # Cleaning up states after end of parsing
    if state == READ_INTEGER:
        integer = helper
    if state == READ_FRACTION:
        fraction = helper
    if state == READ_EXPONENTIAL:
        exponential = helper

    pattern = (padding_1, prefix, padding_2, integer, fraction, exponential,
               padding_3, suffix, padding_4, grouping)

    if neg_pattern is None:
        neg_pattern = pattern
        
    return pattern, neg_pattern




=== Added File Zope3/src/zope/i18n/locales.py === (944/1044 lines abridged)
##############################################################################
#
# Copyright (c) 2002 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
##############################################################################
"""Locale and LocaleProdiver Implmentation.

$Id: locales.py,v 1.1 2003/01/05 20:19:37 srichter Exp $
"""
import time, os
import datetime
from xml.dom.minidom import parse as parseXML

from zope.i18n.interfaces import ILocaleProvider, ILocale
from zope.i18n.interfaces import ILocaleVersion, ILocaleIdentity
from zope.i18n.interfaces import ILocaleTimeZone, ILocaleCalendar
from zope.i18n.interfaces import ILocaleNumberFormat, ILocaleCurrency

from zope.i18n.format import NumberFormat, DateTimeFormat

# Setup the locale directory
from zope import i18n
LOCALEDIR = os.path.join(os.path.dirname(i18n.__file__), "locales")

# Define some constants that can be used

JANUARY = 1
FEBRUARY = 2
MARCH = 3
APRIL = 4
MAY = 5
JUNE = 6
JULY = 7
AUGUST = 8
SEPTEMBER = 9
OCTOBER = 10
NOVEMBER = 11
DECEMBER = 12

MONDAY = 1
TUESDAY = 2
WEDNESDAY = 3

[-=- -=- -=- 944 lines omitted -=- -=- -=-]

                default = True
            else:
                default = False
            node = curr_node.getElementsByTagName('symbol')[0]
            currency.setSymbol(self._getText(node.childNodes))
            node = curr_node.getElementsByTagName('name')[0]
            currency.setName(self._getText(node.childNodes))
            try:
                node = curr_node.getElementsByTagName('decimal')[0]
                currency.setDecimal(self._getText(node.childNodes))
            except IndexError:
                pass # No decimal node
            try:
                node = curr_node.getElementsByTagName('pattern')[0]
                currency.setPattern(self._getText(node.childNodes))
            except IndexError:
                pass # No pattern node

            currencies.append((id, currency, default))
            
        return currencies
        

    def __call__(self):
        """Create the Locale."""
        locale = ICULocale(self._extractIdentity())
        # Set Versioning
        for version in self._extractVersions():
            locale.setVersion(version)
        # Set Languages
        for lang in self._extractLanguages().items():
            locale.setLanguageName(*lang)
        # Set Countries
        for country in self._extractCountries().items():
            locale.setCountryName(*country)
        # Set TimeZones
        for tz in self._extractTimeZones():
            locale.setTimeZone(*tz)
        # Set Calendars
        for cal in self._extractCalendars():
            locale.setCalendar(*cal)
        # Set Number Formats
        for format in self._extractNumberFormats():
            locale.setNumberFormat(*format)
        # Set Currencies
        for currency in self._extractCurrencies():
            locale.setCurrency(*currency)

        return locale
        


=== Added File Zope3/src/zope/i18n/locales.pyc ===
  <Binary-ish file>

=== Zope3/src/zope/i18n/interfaces.py 1.1 => 1.2 ===
--- Zope3/src/zope/i18n/interfaces.py:1.1	Mon Dec 30 21:52:13 2002
+++ Zope3/src/zope/i18n/interfaces.py	Sun Jan  5 15:19:37 2003
@@ -15,12 +15,11 @@
 
 $Id$
 """
+from zope.interface import Interface, Attribute
 
-from zope.interface import Interface
 
 class II18nAware(Interface):
-    """Internationalization aware content object.
-    """
+    """Internationalization aware content object."""
 
     def getDefaultLanguage():
         """Return the default language."""
@@ -337,6 +336,7 @@
         See ITranslationService for details.
         """
 
+
 class IMessageExportFilter(Interface):
     """The Export Filter for Translation Service Messages.
 
@@ -352,8 +352,8 @@
            one language. A good example for that is a GettextFile.
         """
 
-class INegotiator(Interface):
 
+class INegotiator(Interface):
     """A language negotiation service.
     """
 
@@ -379,7 +379,7 @@
         # XXX I'd like for there to be a symmetric interface method, one in
         # which an adaptor is gotten for both the first arg and the second
         # arg.  I.e. getLanguage(obj, env)
-        # But this isn't a good match for the iTranslationService.translate()
+        # But this isn't a good match for the ITranslationService.translate()
         # method. :(
 
 
@@ -392,3 +392,445 @@
            should describe the order of preference. Therefore the first
            character set in the list is the most preferred one.
         """
+
+
+class ILocaleProvider(Interface):
+    """This interface is our connection to the Zope 3 service. From it
+    we can request various Locale objects that can perform all sorts of
+    fancy operations.
+
+    This service will be singelton global service, since it doe not make much
+    sense to have many locale facilities, especially since this one will be so
+    complete, since we will the ICU XML Files as data.  """
+
+    def loadLocale(language=None, country=None, variant=None): 
+        """Load the locale with the specs that are given by the arguments of
+        the method. Note that the LocaleProvider must know where to get the
+        locales from."""
+
+    def getLocale(language=None, country=None, variant=None):
+        """Get the Locale object for a particular language, country and
+        variant."""
+
+
+class ILocaleIdentity(Interface):
+    """Identity information class for ILocale objects.
+
+    Three pieces of information are required to identify a locale:
+
+      o language -- Language in which all of the locale txt information are
+        returned.
+
+      o country -- Country for which the locale's information are
+        appropriate. None means all countries in which language is spoken.
+
+      o variant -- Sometimes there are regional or historical differences even
+        in a certain country. For these cases we use the variant field. A good
+        example is the time before the Euro in Germany for example. Therefore
+        a valid variant would be 'PREEURO'.  
+
+    Note that all of these attributes are read-only once they are set (usually
+    done in constructor)!
+
+    This object is also used to uniquely identify a locale."""
+
+    def getLanguage():
+        """Return the language of the id."""
+
+    def getCountry():
+        """Return the country of the id."""
+
+    def getVariant():
+        """Return the variant of the id."""
+
+    def setCorrespondence(vendor, text):
+        """Correspondences can be used to map our Locales to other system's
+        locales. A common use in ICU is to define a correspondence to the
+        Windows Locale System. This method sets a correspondence."""
+
+    def getAllCorrespondences():
+        """Return all defined correspondences."""
+
+    def __repr__(self):
+        """Defines the representation of the id, which should be a compact
+        string that references the language, country and variant."""
+
+
+class ILocaleVersion(Interface):
+    """Allows one to specify a version of a locale."""
+
+    id = Attribute("Version identifier; usually something like 1.0.1")
+
+    date = Attribute("A datetime object specifying the version creation date.")
+
+    comment = Attribute("A text comment about the version.")
+
+    def __cmp__(other):
+        """Compares versions, so the order can be determined."""
+
+
+class ILocaleTimeZone(Interface):
+    """Represents and defines various timezone information. It mainly manages
+    all the various names for a timezone and the cities contained in it.
+
+    Important: ILocaleTimeZone objects are not intended to provide
+    implementations for the standard datetime module timezone support. They
+    are merily used for Locale support. 
+    """
+
+    id = Attribute("Standard name of the timezone for unique referencing.")
+
+    def addCity(city):
+        """Add a city to the timezone."""
+
+    def getCities():
+        """Return all cities that are in this timezone."""
+
+    def setName(type, long, short):
+        """Add a new long and short name for the timezone."""
+
+    def getName(type):
+        """Get the long and names by type."""
+
+
+class ILocaleCalendar(Interface):
+    """There is a massive amount of information contained in the calendar,
+    which made it attractive to be added """
+
+    def update(other):
+        """Update this calendar using data from other. Assume that unless
+        other's data is not present, other has always more specific
+        information."""
+    
+    def setMonth(id, name, abbr):
+        """Add a month's locale data."""
+
+    def getMonth(id):
+        """Get a month (name, abbr) by its id."""
+
+    def getMonthNames():
+        """Return a list of month names."""
+
+    def getMonthIdFromName(name):
+        """Return the id of the month with the right name."""
+
+    def getMonthAbbr():
+        """Return a list of month abbreviations."""
+
+    def getMonthIdFromAbbr(abbr):
+        """Return the id of the month with the right abbreviation."""
+
+    def setWeekday(id, name, abbr):
+        """Add a weekday's locale data."""
+
+    def getWeekday(id):
+        """Get a weekday by its id."""
+
+    def getWeekdayNames():
+        """Return a list of weekday names."""
+
+    def getWeekdayIdFromName(name):
+        """Return the id of the weekday with the right name."""
+
+    def getWeekdayAbbr():
+        """Return a list of weekday abbreviations."""
+
+    def getWeekdayIdFromAbbr(abbr):
+        """Return the id of the weekday with the right abbr."""
+
+    def setEra(id, name):
+        """Add a era's locale data."""
+
+    def getEra(id):
+        """Get a era by its id."""
+
+    def setAM(text):
+        """Set AM text representation."""
+
+    def getAM():
+        """Get AM text representation."""
+
+    def setPM(text):
+        """Set PM text representation."""
+
+    def getPM():
+        """Get PM text representation."""
+
+    def setPatternCharacters(chars):
+        """Set allowed pattern characters for calendar patterns."""
+
+    def getPatternCharacters():
+        """Get allowed pattern characters for a particular type (id), which
+        can be claendar, number, and currency for example."""
+
+    def setTimePattern(type, pattern):
+        """Set the time pattern for a particular time format type. Possible
+        types are full, long, medium, and short.""" 
+
+    def getTimePattern(type):
+        """Get the time pattern for a particular time format type. Possible
+        types are full, long, medium, and short.""" 
+
+    def setDatePattern(name, pattern):
+        """Set the date pattern for a particular date format type. Possible
+        types are full, long, medium, and short.""" 
+
+    def getDatePattern(name):
+        """Get the date pattern for a particular date format type. Possible
+        types are full, long, medium, and short.""" 
+
+    def setDateTimePattern(pattern):
+        """Set the date pattern for the datetime.""" 
+
+    def getDateTimePattern():
+        """Get the date pattern for the datetime.""" 
+
+
+class ILocaleNumberFormat(Interface):
+    """This interface defines all the formatting information for one class of
+    numbers."""
+
+    def setPattern(name, pattern):
+        """Define a new pattern by name."""
+
+    def getPattern(name):
+        """Get a pattern by its name."""
+
+    def getAllPatternIds():
+        """Return a list of all pattern names."""
+
+    def setSymbol(name, symbol):
+        """Define a new symbol by name."""
+
+    def getSymbol(name):
+        """Get a symbol by its name."""
+
+    def getAllSymbolIds():
+        """Return a list of all symbol names."""
+
+    def getSymbolMap():
+        """Return a map of all symbols. Thisis useful for the INumberFormat."""
+
+    
+class ILocaleCurrency(Interface):
+    """Defines a particular currency."""
+
+    def setSymbol(symbol):
+        """Set currency symbol; i.e. $ for USD."""
+
+    def getSymbol():
+        """Get currency symbol."""
+
+    def setName(name):
+        """Set currency name; i.e. USD for US Dollars."""
+
+    def getName():
+        """Get currency name."""
+
+    def setDecimal(decimal):
+        """Set currency decimal separator. In the US this is usually the
+        period '.', while Germany uses the comma ','."""
+
+    def getDecimal():
+        """Get currency decimal separator."""
+
+    def setPattern(pattern):
+        """Set currency pattern. Often we want different formatting rules for
+        monetary amounts; for example a precision more than 1/100 of the main
+        currency unit is often not desired."""
+
+    def getPattern():
+        """Get currency pattern."""    
+
+        
+class ILocale(Interface):
+    """This class contains all important information about the locale.
+
+    Usually a Locale is identified using a specific language, country and
+    variant. However, the country and variant are optional, so that a lookup
+    hierarchy develops. It is easy to recognize that a locale that is missing
+    the variant is more general applicable than the one with the
+    variant. Therefore, if a specific Locale does not contain the required
+    information, it should look one level higher. 
+    There will be a root locale that specifies none of the above identifiers.
+    """
+
+    def getLocaleLanguageId():
+        """Return the id of the language that this locale represents. 'None'
+        can be returned."""
+
+    def getLocaleCountryId():
+        """Return the id of the country that this locale represents. 'None'
+        can be returned."""
+
+    def getLocaleVariantId():
+        """Return the id of the variant that this locale represents. 'None'
+        can be returned."""
+
+    def getDisplayLanguage(id):
+        """Return the full name of the language whose id was passed in the
+        language of this locale."""
+
+    def getDisplayCountry(id):
+        """Return the full name of the country of this locale."""
+
+    def getTimeFormatter(name):
+        """Get the TimeFormat object called 'name'. The following names are
+        recognized: full, long, medium, short."""
+
+    def getDateFormat(name):
+        """Get the DateFormat object called 'name'. The following names are
+        recognized: full, long, medium, short."""
+
+    def getDateTimeFormatter(name):
+        """Get the DateTimeFormat object called 'name'. The following names
+        are recognized: full, long, medium, short."""
+
+    def getNumberFormatter(name):
+        """Get the NumberFormat object called 'name'. The following names are
+        recognized: decimal, percent, scientific, currency."""
+
+
+
+class IFormat(Interface):
+    """A generic formatting class. It basically contains the parsing and
+    construction method for the particular object the formatting class
+    handles.
+
+    The constructor will always require a pattern (specific to the object).
+    """
+
+    def setPattern(pattern):
+        """Overwrite the old formatting pattern with the new one."""
+
+    def getPattern():
+        """Get the currently used pattern."""
+
+    def parse(text, pattern=None):
+        """Parse the text and convert it to an object, which is returned."""
+
+    def format(obj, pattern=None):
+        """Format an object to a string using the pattern as a rule."""
+
+
+
+class INumberFormat(IFormat):
+    u"""Specific number formatting interface. Here are the formatting
+    rules (I modified the rules from ICU a bit, since I think they did not
+    agree well with the real world XML formatting strings):
+
+      posNegPattern      := ({subpattern};{subpattern} | {subpattern})  
+      subpattern         := {padding}{prefix}{padding}{integer}{fraction}
+                            {exponential}{padding}{suffix}{padding}  
+      prefix             := '\u0000'..'\uFFFD' - specialCharacters *  
+      suffix             := '\u0000'..'\uFFFD' - specialCharacters *
+      integer            := {digitField}'0'  
+      fraction           := {decimalPoint}{digitField}  
+      exponential        := E integer
+      digitField         := ( {digitField} {groupingSeparator} | 
+                              {digitField} '0'* | 
+                              '0'* | 
+                              {optionalDigitField} )  
+      optionalDigitField := ( {digitField} {groupingSeparator} | 
+                              {digitField} '#'* | 
+                              '#'* )  
+      groupingSeparator  := ,  
+      decimalPoint       := .  
+      padding            := * '\u0000'..'\uFFFD'
+
+
+    Possible pattern symbols:
+
+      0    A digit. Always show this digit even if the value is zero.  
+      #    A digit, suppressed if zero  
+      .    Placeholder for decimal separator  
+      ,    Placeholder for grouping separator  
+      E    Separates mantissa and exponent for exponential formats  
+      ;    Separates formats (that is, a positive number format verses a
+           negative number format)
+      -    Default negative prefix. Note that the locale's minus sign
+           character is used.
+      +    If this symbol is specified the locale's plus sign character is
+           used.
+      %    Multiply by 100, as percentage  
+      ?    Multiply by 1000, as per mille  
+      ¤    This is the currency sign. it will be replaced by a currency
+           symbol. If it is present in a pattern, the monetary decimal
+           separator is used instead of the decimal separator.
+      ¤¤   This is the international currency sign. It will be replaced 
+           by an international currency symbol.  If it is present in a
+           pattern, the monetary decimal separator is used instead of 
+           the decimal separator. 
+      X    Any other characters can be used in the prefix or suffix  
+      '    Used to quote special characters in a prefix or suffix  
+    """
+
+    symbols = Attribute(
+        """The symbols attribute maps various formatting symbol names to the
+        symbol itself.
+
+        Here are the required names:
+
+          decimal, group, list, percentSign, nativeZeroDigit, patternDigit,
+          plusSign, minusSign, exponential, perMille, infinity, nan
+        
+        """)
+
+
+class ICurrencyFormat(INumberFormat):
+    """Special currency parsing class."""
+
+    currency = Attribute("""This object must implement ILocaleCurrency. See
+                            this interface's documentation for details.""")
+
+
+class IDateTimeFormat(IFormat):
+    """DateTime formatting and parsing interface. Here is a list of
+    possible characters and their meaning:
+
+      Symbol Meaning               Presentation      Example  
+
+      G      era designator        (Text)            AD  
+      y      year                  (Number)          1996  
+      M      month in year         (Text and Number) July and 07
+      d      day in month          (Number)          10
+      h      hour in am/pm (1~12)  (Number)          12  
+      H      hour in day (0~23)    (Number)          0
+      m      minute in hour        (Number)          30  
+      s      second in minute      (Number)          55  
+      S      millisecond           (Number)          978  
+      E      day in week           (Text)            Tuesday  
+      D      day in year           (Number)          189  
+      F      day of week in month  (Number)          2 (2nd Wed in July)
+      w      week in year          (Number)          27  
+      W      week in month         (Number)          2  
+      a      am/pm marker          (Text)            pm  
+      k      hour in day (1~24)    (Number)          24  
+      K      hour in am/pm (0~11)  (Number)          0  
+      z      time zone             (Text)            Pacific Standard Time
+      '      escape for text  
+      ''     single quote                            '
+
+    Meaning of the amount of characters:
+
+      Text
+
+        Four or more, use full form, <4, use short or abbreviated form if it
+        exists. (for example, "EEEE" produces "Monday", "EEE" produces "Mon")
+
+      Number
+
+        The minimum number of digits. Shorter numbers are zero-padded to this
+        amount (for example, if "m" produces "6", "mm" produces "06"). Year is
+        handled specially; that is, if the count of 'y' is 2, the Year will be
+        truncated to 2 digits. (for example, if "yyyy" produces "1997", "yy"
+        produces "97".)
+
+      Text and Number
+
+        Three or over, use text, otherwise use number. (for example, "M"
+        produces "1", "MM" produces "01", "MMM" produces "Jan", and "MMMM"
+        produces "January".)  """
+
+    calendar = Attribute("""This object must implement ILocaleCalendar. See
+                            this interface's documentation for details.""")