[Zope3-dev] Date and time parsing (ISO 8601)

Marius Gedminas mgedmin@codeworks.lt
Fri, 22 Nov 2002 17:55:55 +0200


(This is not related to the US/European date format thread.  In
Lithuania we've been using YYYY-MM-DD long before ISO standartized on
it. ;-)

I was implementing type conversion in Psycopg database adapter, and I
noticed that Zope.Misc.DateTimeParse does not accept certain variations
of the ISO 8601 input, for example, "2001-06-25 12:14:00-07" (which is
what PostgreSQL produces).

The following part of DateTimeParser.__doc__ is a bit misleading, or at
least unclear:

    The function automatically detects and handles
    ISO8601 compliant dates (YYYY-MM-DDThh:ss:mmTZD).
    See http://www.w3.org/TR/NOTE-datetime for full specs.

The w3c note[1] is not a full specification of ISO 8601.  Quote:

    ISO 8601 describes a large number of date/time formats. To reduce
    the scope for error and the complexity of software, it is useful to
    restrict the supported formats to a small number.

It might be the full specification of what DateTimeParser considers to
be ISO 8601 dates, but I think it is a little too narrow.  "Be strict in
what you produce and liberal in what you accept", etc.

Markus Kuhn's ISO-time page[2] is also often referred to, but it only
contains some examples, and no specifications.  A Google search produced 
[3], which is more extensive, and [4], which expresses a wider subset of
ISO 8601 in ABNF notation.

I propose to extend the subset of DateTimeParser recognized ISO 8601
dates to the following:

  Date part

    YYYYMMDD                (ISO 8601 basic format)
    YYYY-MM-DD              (ISO 8601 extended format)

  Time part
    HH
    HHMM
    HH:MM                   (basic and extended again)
    HHMMSS
    HH:MM:SS
    HHMMSS.s
    HH:MM:SS.s
    HHMMSS,s                (ISO 8601 allows, and even prefers comma as
    HH:MM:SS,s              a decimal point)

  Timezone part
    Z                       (literal Z, means UTC)
    +hh
    -hh
    +hhmm
    -hhmm
    +hh:mm
    -hh:mm

  Time with timezone consists of a time part immediatelly followed by
  a timezone part with no separators.

  Date and time consist of a date part and a time part separated by ' ' or
  'T' (ISO8601 prefers 'T', but ' ' is more human readable and, for
  example, PostgreSQL uses it).  Again, timezone can be appended to the
  time part without additional separators.

The rest of ISO 8601 is either a bit exotic (omitting years/months/hours,
specifying week or day numbers) or too uncertain (omitting centuries).
I am also not entirely sure that we need ISO8601 basic formats (those
without separators).

References:
  [1] http://www.w3.org/TR/NOTE-datetime
  [2] http://www.cl.cam.ac.uk/~mgk25/iso-time.html
  [3] http://www.ietf.org/rfc/rfc3339.txt
  [4] http://www.mcs.vuw.ac.nz/technical/software/SGML/doc/iso8601/ISO8601.html

Another problem with DateTimeParse is that currently the only way to
parse time-only values is to prepend a fake date:

  >>> from Zope.Misc.DateTimeParse import parse
  >>> parse('2001-01-01 12:30:17.5+02:00')[3:]
  (12, 30, 17.5, '+02:00')

It would be nice to be able to do, e.g.,

  >>> from Zope.Misc.DateTimeParse import parse_time
  >>> parse_time('12:30:17.5+02:00')
  (12, 30, 17.5, '+02:00')

(That will be more useful when datetime module gets time/timetz
classes.)

Currently I've written a set of functions to parse all the date/time
variations (dates, times, times with timezones, datetimes, datetimes
with timezones) into tuples according to the formats outlined bellow.
After doing that I've decided that it might be a better idea to
integrate the functionality to DateTimeParser, as it might be useful to
other components.  The test suite could be (more or less) easily adapted
to check the acceptance of various ISO 8601 formats by other parsers.

Comments?

Marius Gedminas
-- 
UNIX is user friendly. It's just selective about who its friends are.