Skip to content

Commit

Permalink
msgpack: support tzindex in datetime
Browse files Browse the repository at this point in the history
Support non-zero tzindex in datetime extended type. If both tzoffset and
tzindex are specified, tzindex is prior (same as in Tarantool [1]).

Use `tz` parameter to set up timezone name:

```
dt = tarantool.Datetime(year=2022, month=8, day=31,
                        hour=18, minute=7, sec=54,
                        nsec=308543321, tz='Europe/Moscow')
```

You may use `tz` property to get timezone name of a datetime object.

pytz is used to build timezone info. Tarantool index to Olson name
map and inverted one are built with gen_timezones.sh script based on
tarantool/go-tarantool script [2]. All Tarantool unique and alias
timezones present in pytz.all_timezones list. Only the following
abbreviated timezones from Tarantool presents in pytz.all_timezones
(version 2022.2.1):
- CET
- EET
- EST
- GMT
- HST
- MST
- UTC
- WET

pytz does not natively support work with abbreviated timezones due to
its possibly ambiguous nature [3-5]. Tarantool itself do not support
work with ambiguous abbreviated timezones:

```
Tarantool 2.10.1-0-g482d91c66

tarantool> datetime.new({tz = 'BST'})
---
- error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
...
```

If ambiguous timezone is specified, the exception is raised.

Tarantool header timezones.h [6] provides a map for all abbreviated
timezones with category info (all ambiguous timezones are marked with
TZ_AMBIGUOUS flag) and offset info. We parse this info to build
pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not
supported natively by pytz.

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/
2. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh
3. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz
4. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz
5. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
6. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h

Closes #204
  • Loading branch information
DifferentialOrange committed Sep 26, 2022
1 parent 7343acf commit 976a990
Show file tree
Hide file tree
Showing 7 changed files with 2,021 additions and 11 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
You may use `tzoffset` property to get timezone offset of a datetime
object.

- Timezone in datetime type support (#204).

Use `tz` parameter to set up timezone name:

```python
dt = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321, tz='Europe/Moscow')
```

If both `tz` and `tzoffset` is specified, `tz` is used.

You may use `tz` property to get timezone name of a datetime object.

### Changed
- Bump msgpack requirement to 1.0.4 (PR #223).
The only reason of this bump is various vulnerability fixes,
Expand Down
60 changes: 49 additions & 11 deletions tarantool/msgpack_ext/types/datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
import pandas
import pytz

import tarantool.msgpack_ext.types.timezones as tt_timezones
from tarantool.error import MsgpackError

# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
#
# The datetime MessagePack representation looks like this:
Expand Down Expand Up @@ -63,6 +66,17 @@ def compute_offset(timestamp):
# There is no precision loss since offset is in minutes
return int(utc_offset.total_seconds()) // SEC_IN_MIN

def get_python_tzinfo(tz, error_class):
if tz in pytz.all_timezones:
return pytz.timezone(tz)

# Checked with timezones/validate_timezones.py
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tz]
if (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0:
raise error_class(f'Failed to create datetime with ambiguous timezone "{tz}"')

return pytz.FixedOffset(tt_tzinfo['offset'])

def msgpack_decode(data):
cursor = 0
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
Expand All @@ -84,23 +98,29 @@ def msgpack_decode(data):
datetime = pandas.to_datetime(total_nsec, unit='ns')

if tzindex != 0:
raise NotImplementedError
if tzindex not in tt_timezones.indexToTimezone:
raise MsgpackError(f'Failed to decode datetime with unknown tzindex "{tzindex}"')
tz = tt_timezones.indexToTimezone[tzindex]
tzinfo = get_python_tzinfo(tz, MsgpackError)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), tz
elif tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), ''
else:
return datetime
return datetime, ''

class Datetime():
def __init__(self, data=None, *, timestamp=None, year=None, month=None,
day=None, hour=None, minute=None, sec=None, nsec=None,
tzoffset=0):
tzoffset=0, tz=''):
if data is not None:
if not isinstance(data, bytes):
raise ValueError('data argument (first positional argument) ' +
'expected to be a "bytes" instance')

self._datetime = msgpack_decode(data)
datetime, tz = msgpack_decode(data)
self._datetime = datetime
self._tz = tz
return

# The logic is same as in Tarantool, refer to datetime API.
Expand Down Expand Up @@ -133,11 +153,20 @@ def __init__(self, data=None, *, timestamp=None, year=None, month=None,
microsecond=microsecond,
nanosecond=nanosecond)

if tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
datetime = datetime.replace(tzinfo=tzinfo)
if tz != '':
if tz not in tt_timezones.timezoneToIndex:
raise ValueError(f'Unknown Tarantool timezone "{tz}"')

self._datetime = datetime
tzinfo = get_python_tzinfo(tz, ValueError)
self._datetime = datetime.replace(tzinfo=tzinfo)
self._tz = tz
elif tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
self._datetime = datetime.replace(tzinfo=tzinfo)
self._tz = ''
else:
self._datetime = datetime
self._tz = ''

def __eq__(self, other):
if isinstance(other, Datetime):
Expand All @@ -151,7 +180,7 @@ def __str__(self):
return self._datetime.__str__()

def __repr__(self):
return f'datetime: {self._datetime.__repr__()}'
return f'datetime: {self._datetime.__repr__()}, tz: "{self.tz}"'

def __copy__(self):
cls = self.__class__
Expand Down Expand Up @@ -206,6 +235,10 @@ def tzoffset(self):
return compute_offset(self._datetime)
return 0

@property
def tz(self):
return self._tz

@property
def value(self):
return self._datetime.value
Expand All @@ -214,7 +247,12 @@ def msgpack_encode(self):
seconds = self.value // NSEC_IN_SEC
nsec = self.nsec
tzoffset = self.tzoffset
tzindex = 0

tz = self.tz
if tz != '':
tzindex = tt_timezones.timezoneToIndex[tz]
else:
tzindex = 0

buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)

Expand Down
9 changes: 9 additions & 0 deletions tarantool/msgpack_ext/types/timezones/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from tarantool.msgpack_ext.types.timezones.timezones import (
TZ_AMBIGUOUS,
indexToTimezone,
timezoneToIndex,
timezoneAbbrevInfo,
)

__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
'timezoneAbbrevInfo']
69 changes: 69 additions & 0 deletions tarantool/msgpack_ext/types/timezones/gen-timezones.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env bash
set -xeuo pipefail

SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
SRC_FILE=timezones.h
DST_FILE=timezones.py

[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
wget -O ${SRC_FILE} \
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h

# We don't need aliases in indexToTimezone because Tarantool always replace it:
#
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
# ---
# ...
# tarantool> T
# ---
# - 2022-01-01T00:00:00 Pacific/Kanton
# ...
#
# So we can do the same and don't worry, be happy.

cat <<EOF > ${DST_FILE}
# Automatically generated by gen-timezones.sh
TZ_UTC = 0x01
TZ_RFC = 0x02
TZ_MILITARY = 0x04
TZ_AMBIGUOUS = 0x08
TZ_NYI = 0x10
TZ_OLSON = 0x20
TZ_ALIAS = 0x40
TZ_DST = 0x80
indexToTimezone = {
EOF

grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}

cat <<EOF >> ${DST_FILE}
}
timezoneToIndex = {
EOF

grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}

cat <<EOF >> ${DST_FILE}
}
timezoneAbbrevInfo = {
EOF

grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
echo "}" >> ${DST_FILE}

rm timezones.h

python validate_timezones.py
Loading

0 comments on commit 976a990

Please sign in to comment.