Skip to content

Commit

Permalink
msgpack: support datetime extended type
Browse files Browse the repository at this point in the history
Tarantool supports datetime type since version 2.10.0 [1]. This patch
introduced the support of Tarantool datetime type in msgpack decoders
and encoders.

Tarantool datetime objects are decoded to `tarantool.Datetime` type.
`tarantool.Datetime` may be encoded to Tarantool datetime objects.

`tarantool.Datetime` stores data in a `pandas.Timestamp` object. You can
create `tarantool.Datetime` objects either from msgpack data or by using
the same API as in Tarantool:

```
dt1 = tarantool.Datetime(year=2022, month=8, day=31,
                         hour=18, minute=7, sec=54,
                         nsec=308543321)

dt2 = tarantool.Datetime(timestamp=1661969274)

dt3 = tarantool.Datetime(timestamp=1661969274, nsec=308543321)
```

`tarantool.Datetime` exposes `year`, `month`, `day`, `hour`, `minute`,
`sec`, `nsec`, `timestamp` and `value` (integer epoch time with
nanoseconds precision) properties if you need to convert
`tarantool.Datetime` to any other kind of datetime object:

```
pdt = pandas.Timestamp(year=dt.year, month=dt.month, day=dt.day,
                       hour=dt.hour, minute=dt.minute, second=dt.sec,
                       microsecond=(dt.nsec // 1000),
                       nanosecond=(dt.nsec % 1000))
```

`pandas.Timestamp` was chosen to store data because it could be used
to store both nanoseconds and timezone information. In-build Python
`datetime.datetime` supports microseconds at most, `numpy.datetime64` do
not support timezones.

Tarantool datetime interval type is planned to be stored in custom
type `tarantool.Interval` and we'll need a way to support arithmetic
between datetime and interval. This is the main reason we use custom
class instead of plain `pandas.Timestamp`. It is also hard to implement
Tarantool-compatible timezones with full conversion support without
custom classes.

This patch does not yet introduce the support of timezones in datetime.

1. tarantool/tarantool#5941
2. https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html

Part of #204
  • Loading branch information
DifferentialOrange committed Sep 26, 2022
1 parent c70dfa6 commit 26b6a59
Show file tree
Hide file tree
Showing 10 changed files with 483 additions and 6 deletions.
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Decimal type support (#203).
- UUID type support (#202).
- Datetime type support and tarantool.Datetime type (#204).

Tarantool datetime objects are decoded to `tarantool.Datetime`
type. `tarantool.Datetime` may be encoded to Tarantool datetime
objects.

You can create `tarantool.Datetime` objects either from msgpack
data or by using the same API as in Tarantool:

```python
dt1 = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321)

dt2 = tarantool.Datetime(timestamp=1661969274)

dt3 = tarantool.Datetime(timestamp=1661969274, nsec=308543321)
```

`tarantool.Datetime` exposes `year`, `month`, `day`, `hour`,
`minute`, `sec`, `nsec`, `timestamp` and `value` (integer epoch time
with nanoseconds precision) properties if you need to convert
`tarantool.Datetime` to any other kind of datetime object:

```python
pdt = pandas.Timestamp(year=dt.year, month=dt.month, day=dt.day,
hour=dt.hour, minute=dt.minute, second=dt.sec,
microsecond=(dt.nsec // 1000),
nanosecond=(dt.nsec % 1000))
```

### Changed
- Bump msgpack requirement to 1.0.4 (PR #223).
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
msgpack>=1.0.4
pandas
6 changes: 5 additions & 1 deletion tarantool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@
ENCODING_DEFAULT,
)

from tarantool.msgpack_ext.types.datetime import (
Datetime,
)

__version__ = "0.9.0"


Expand Down Expand Up @@ -91,7 +95,7 @@ def connectmesh(addrs=({'host': 'localhost', 'port': 3301},), user=None,

__all__ = ['connect', 'Connection', 'connectmesh', 'MeshConnection', 'Schema',
'Error', 'DatabaseError', 'NetworkError', 'NetworkWarning',
'SchemaError', 'dbapi']
'SchemaError', 'dbapi', 'Datetime']

# ConnectionPool is supported only for Python 3.7 or newer.
if sys.version_info.major >= 3 and sys.version_info.minor >= 7:
Expand Down
9 changes: 9 additions & 0 deletions tarantool/msgpack_ext/datetime.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from tarantool.msgpack_ext.types.datetime import Datetime

EXT_ID = 4

def encode(obj):
return obj.msgpack_encode()

def decode(data):
return Datetime(data)
8 changes: 6 additions & 2 deletions tarantool/msgpack_ext/packer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
from uuid import UUID
from msgpack import ExtType

from tarantool.msgpack_ext.types.datetime import Datetime

import tarantool.msgpack_ext.decimal as ext_decimal
import tarantool.msgpack_ext.uuid as ext_uuid
import tarantool.msgpack_ext.datetime as ext_datetime

encoders = [
{'type': Decimal, 'ext': ext_decimal},
{'type': UUID, 'ext': ext_uuid },
{'type': Decimal, 'ext': ext_decimal },
{'type': UUID, 'ext': ext_uuid },
{'type': Datetime, 'ext': ext_datetime},
]

def default(obj):
Expand Down
196 changes: 196 additions & 0 deletions tarantool/msgpack_ext/types/datetime.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
from copy import deepcopy

import pandas

# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
#
# The datetime MessagePack representation looks like this:
# +---------+----------------+==========+-----------------+
# | MP_EXT | MP_DATETIME | seconds | nsec; tzoffset; |
# | = d7/d8 | = 4 | | tzindex; |
# +---------+----------------+==========+-----------------+
# MessagePack data contains:
#
# * Seconds (8 bytes) as an unencoded 64-bit signed integer stored in the
# little-endian order.
# * The optional fields (8 bytes), if any of them have a non-zero value.
# The fields include nsec (4 bytes), tzoffset (2 bytes), and
# tzindex (2 bytes) packed in the little-endian order.
#
# seconds is seconds since Epoch, where the epoch is the point where the time
# starts, and is platform dependent. For Unix, the epoch is January 1,
# 1970, 00:00:00 (UTC). Tarantool uses a double type, see a structure
# definition in src/lib/core/datetime.h and reasons in
# https://github.com/tarantool/tarantool/wiki/Datetime-internals#intervals-in-c
#
# nsec is nanoseconds, fractional part of seconds. Tarantool uses int32_t, see
# a definition in src/lib/core/datetime.h.
#
# tzoffset is timezone offset in minutes from UTC. Tarantool uses a int16_t type,
# see a structure definition in src/lib/core/datetime.h.
#
# tzindex is Olson timezone id. Tarantool uses a int16_t type, see a structure
# definition in src/lib/core/datetime.h. If both tzoffset and tzindex are
# specified, tzindex has the preference and the tzoffset value is ignored.

SECONDS_SIZE_BYTES = 8
NSEC_SIZE_BYTES = 4
TZOFFSET_SIZE_BYTES = 2
TZINDEX_SIZE_BYTES = 2

BYTEORDER = 'little'

NSEC_IN_SEC = 1000000000
NSEC_IN_MKSEC = 1000

def get_bytes_as_int(data, cursor, size):
part = data[cursor:cursor + size]
return int.from_bytes(part, BYTEORDER, signed=True), cursor + size

def get_int_as_bytes(data, size):
return data.to_bytes(size, byteorder=BYTEORDER, signed=True)

def msgpack_decode(data):
cursor = 0
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)

data_len = len(data)
if data_len == (SECONDS_SIZE_BYTES + NSEC_SIZE_BYTES + \
TZOFFSET_SIZE_BYTES + TZINDEX_SIZE_BYTES):
nsec, cursor = get_bytes_as_int(data, cursor, NSEC_SIZE_BYTES)
tzoffset, cursor = get_bytes_as_int(data, cursor, TZOFFSET_SIZE_BYTES)
tzindex, cursor = get_bytes_as_int(data, cursor, TZINDEX_SIZE_BYTES)
elif data_len == SECONDS_SIZE_BYTES:
nsec = 0
tzoffset = 0
tzindex = 0
else:
raise MsgpackError(f'Unexpected datetime payload length {data_len}')

if (tzoffset != 0) or (tzindex != 0):
raise NotImplementedError

total_nsec = seconds * NSEC_IN_SEC + nsec

return pandas.to_datetime(total_nsec, unit='ns')

class Datetime():
def __init__(self, data=None, *, timestamp=None, year=None, month=None,
day=None, hour=None, minute=None, sec=None, nsec=None):
if data is not None:
if not isinstance(data, bytes):
raise ValueError('data argument (first positional argument) ' +
'expected to be a "bytes" instance')

self._datetime = msgpack_decode(data)
return

# The logic is same as in Tarantool, refer to datetime API.
# https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/
if timestamp is not None:
if ((year is not None) or (month is not None) or \
(day is not None) or (hour is not None) or \
(minute is not None) or (sec is not None)):
raise ValueError('Cannot provide both timestamp and year, month, ' +
'day, hour, minute, sec')

if nsec is not None:
if not isinstance(timestamp, int):
raise ValueError('timestamp must be int if nsec provided')

total_nsec = timestamp * NSEC_IN_SEC + nsec
self._datetime = pandas.to_datetime(total_nsec, unit='ns')
else:
self._datetime = pandas.to_datetime(timestamp, unit='s')
else:
if nsec is not None:
microsecond = nsec // NSEC_IN_MKSEC
nanosecond = nsec % NSEC_IN_MKSEC
else:
microsecond = 0
nanosecond = 0

self._datetime = pandas.Timestamp(year=year, month=month, day=day,
hour=hour, minute=minute, second=sec,
microsecond=microsecond,
nanosecond=nanosecond)

def __eq__(self, other):
if isinstance(other, Datetime):
return self._datetime == other._datetime
elif isinstance(other, pandas.Timestamp):
return self._datetime == other
else:
return False

def __str__(self):
return self._datetime.__str__()

def __repr__(self):
return f'datetime: {self._datetime.__repr__()}'

def __copy__(self):
cls = self.__class__
result = cls.__new__(cls)
result.__dict__.update(self.__dict__)
return result

def __deepcopy__(self, memo):
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
setattr(result, k, deepcopy(v, memo))
return result

@property
def year(self):
return self._datetime.year

@property
def month(self):
return self._datetime.month

@property
def day(self):
return self._datetime.day

@property
def hour(self):
return self._datetime.hour

@property
def minute(self):
return self._datetime.minute

@property
def sec(self):
return self._datetime.second

@property
def nsec(self):
# microseconds + nanoseconds
return self._datetime.value % NSEC_IN_SEC

@property
def timestamp(self):
return self._datetime.timestamp()

@property
def value(self):
return self._datetime.value

def msgpack_encode(self):
seconds = self.value // NSEC_IN_SEC
nsec = self.nsec
tzoffset = 0
tzindex = 0

buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)

if (nsec != 0) or (tzoffset != 0) or (tzindex != 0):
buf = buf + get_int_as_bytes(nsec, NSEC_SIZE_BYTES)
buf = buf + get_int_as_bytes(tzoffset, TZOFFSET_SIZE_BYTES)
buf = buf + get_int_as_bytes(tzindex, TZINDEX_SIZE_BYTES)

return buf
6 changes: 4 additions & 2 deletions tarantool/msgpack_ext/unpacker.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import tarantool.msgpack_ext.decimal as ext_decimal
import tarantool.msgpack_ext.uuid as ext_uuid
import tarantool.msgpack_ext.datetime as ext_datetime

decoders = {
ext_decimal.EXT_ID: ext_decimal.decode,
ext_uuid.EXT_ID : ext_uuid.decode ,
ext_decimal.EXT_ID : ext_decimal.decode ,
ext_uuid.EXT_ID : ext_uuid.decode ,
ext_datetime.EXT_ID: ext_datetime.decode,
}

def ext_hook(code, data):
Expand Down
3 changes: 2 additions & 1 deletion test/suites/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,14 @@
from .test_ssl import TestSuite_Ssl
from .test_decimal import TestSuite_Decimal
from .test_uuid import TestSuite_UUID
from .test_datetime import TestSuite_Datetime

test_cases = (TestSuite_Schema_UnicodeConnection,
TestSuite_Schema_BinaryConnection,
TestSuite_Request, TestSuite_Protocol, TestSuite_Reconnect,
TestSuite_Mesh, TestSuite_Execute, TestSuite_DBAPI,
TestSuite_Encoding, TestSuite_Pool, TestSuite_Ssl,
TestSuite_Decimal, TestSuite_UUID)
TestSuite_Decimal, TestSuite_UUID, TestSuite_Datetime)

def load_tests(loader, tests, pattern):
suite = unittest.TestSuite()
Expand Down
11 changes: 11 additions & 0 deletions test/suites/lib/skip.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,14 @@ def skip_or_run_UUID_test(func):

return skip_or_run_test_tarantool(func, '2.4.1',
'does not support UUID type')

def skip_or_run_datetime_test(func):
"""Decorator to skip or run datetime-related tests depending on
the tarantool version.
Tarantool supports datetime type only since 2.10.0 version.
See https://github.com/tarantool/tarantool/issues/5941
"""

return skip_or_run_test_pcall_require(func, 'datetime',
'does not support datetime type')
Loading

0 comments on commit 26b6a59

Please sign in to comment.