-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support datetime extended type #228
Support datetime extended type #228
Conversation
1e57e7b
to
747995c
Compare
df7f4f9
to
3ac6206
Compare
c1cf8b4
to
0bee71e
Compare
3ac6206
to
989af6c
Compare
cf4ed18
to
57491fc
Compare
7181eb0
to
51ece90
Compare
57491fc
to
e7aab2b
Compare
76b526b
to
7d847c8
Compare
b0d1b7e
to
8b25d9d
Compare
Class inheritance should be improved before review.
|
8234495
to
d9b6748
Compare
@oleg-jukovec , thank you for pointing out about custom timezones. Now we encode abbreviated timezones to If user wants to create a import tarantool.msgpack_ext.types.timezones as tt_timezones
tzinfo = datetime.timezone(
datetime.timedelta(minutes=tt_timezones.timezoneAbbrevInfo['MSK']['offset']),
name='MSK'
)
dt = tarantool.Datetime(
year=2022, month=8, day=31, hour=18, minute=7, second=54,
microsecond=308543, nanosecond=321, tzinfo=tzinfo
) I'm not sure we should expose |
2f7f662
to
fc77b4c
Compare
3f02566
to
c8ae392
Compare
Timezones in tarantool-python RFCTarantool datetime may provide timezone info in two forms: It corresponds to Lua Based on Lua API, user can set up only What should we use to store timezone info? The solution should satisfy several criteria:
Let's discuss both of them. We should unambiguously be able to encode it to the same msgpack, i.e. the same What does it mean that it should be useful? It should be not just a reference info, but a real timezone. If user would want to do something with datetime info, it should behave appropriately. Again, for example, if user gets There is a It is rather inconvenient to store to decode tarantool timezones to custom tarantool.Timezone type. Since we already use This type should be useful, so it should implement standard
If we won't be able to accept any other timezones, it would be an another burden on user's shoulders. To impove his experience, we may provide some migration advices. On the other hand, converting may be provided not as expected by user. Since
Let's describe converting rules. If If If timezone has a name that is unknown to Tarantool, we raise an error. If timezone has We would not implement tzindex/tzoffset correspondence checkup. We will wait for tarantool/tarantool#7680 updates. |
Well, actually, you simply can't do it. pandas cannot work with any pandas-dev/pandas#15986 (comment) pandas source code is full of workaround to detect if it is pytz timezone and mess with its internals. |
2dc6c75
to
f95056a
Compare
After discussion with @oleg-jukovec , we decided to implement |
@oleg-jukovec , @LeonidVas , new revision had been uploaded, humbly requesting one more review iteration. |
d4e0aed
to
47d9864
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. I think this is a clearer solution than previous.
def msgpack_encode(self): | ||
seconds = self._datetime.value // NSEC_IN_SEC | ||
nsec = self.nsec | ||
tzoffset = self.tzoffset | ||
|
||
tz = self.tz | ||
if tz != '': | ||
tzindex = tt_timezones.timezoneToIndex[tz] | ||
else: | ||
tzindex = 0 | ||
|
||
buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES) | ||
|
||
if (nsec != 0) or (tzoffset != 0) or (tzindex != 0): | ||
buf = buf + get_int_as_bytes(nsec, NSEC_SIZE_BYTES) | ||
buf = buf + get_int_as_bytes(tzoffset, TZOFFSET_SIZE_BYTES) | ||
buf = buf + get_int_as_bytes(tzindex, TZINDEX_SIZE_BYTES) | ||
|
||
return buf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do decoding as a separate function, rather than a class method.
Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, since it works with class internals (like self._datetime.value
), it better be a part of the class. And I'm not so sure value
should be exposed as a property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not true self._datetime.value // NSEC_IN_SEC
== Int(self.timestamp)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, until there is no precision loss on conversion from float
to int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we may say that the main responsibility of Datetime is to be encoded and decoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And decoding is already a class method (as an __init__ constructor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we may say that the main responsibility of Datetime is to be encoded and decoded.
At this stage the class looks like DTO
and serialization/deserialization methods is ok. But I guess (I did not look at an inteval pull request) that AddInterval
/SubInterval
/SubDatetime
methods will be added in the future. After that, this class will no longer look like a DTO and serialization/deserialization methos will look strange.
In fact, the class does not require the decode method in the current implementation. There is no need for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddInterval
/SubInterval
/SubDatetime
They are implemented as operators overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the class does not require the decode method in the current implementation.
Well, I wouldn't be too sure about this. For example, if decode would be external, it would be bad for performance because we would need to create excessive intermediate pandas.Timestamp
to transform epoch to year/month/day/etc. Moreover, as far as I remember, pandas and Tarantool treat epoch + tz with different approaches (and that's the reason decode
do .replace(tzinfo=pytz.UTC).tz_convert(tzinfo)
and __init__
do .replace(tzinfo=tzinfo)
), so may need to add some workarounds.
47d9864
to
e71a4ac
Compare
Yeah, it definitely is. Thank you for your advises, the last version of my implementation was dissatisfying for myself too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e71a4ac
to
8c92b86
Compare
Tarantool supports datetime type since version 2.10.0 [1]. This patch introduced the support of Tarantool datetime type in msgpack decoders and encoders. Tarantool datetime objects are decoded to `tarantool.Datetime` type. `tarantool.Datetime` may be encoded to Tarantool datetime objects. `tarantool.Datetime` stores data in a `pandas.Timestamp` object. You can create `tarantool.Datetime` objects either from msgpack data or by using the same API as in Tarantool: ``` dt1 = tarantool.Datetime(year=2022, month=8, day=31, hour=18, minute=7, sec=54, nsec=308543321) dt2 = tarantool.Datetime(timestamp=1661969274) dt3 = tarantool.Datetime(timestamp=1661969274, nsec=308543321) ``` `tarantool.Datetime` exposes `year`, `month`, `day`, `hour`, `minute`, `sec`, `nsec`, `timestamp` and `value` (integer epoch time with nanoseconds precision) properties if you need to convert `tarantool.Datetime` to any other kind of datetime object: ``` pdt = pandas.Timestamp(year=dt.year, month=dt.month, day=dt.day, hour=dt.hour, minute=dt.minute, second=dt.sec, microsecond=(dt.nsec // 1000), nanosecond=(dt.nsec % 1000)) ``` `pandas.Timestamp` was chosen to store data because it could be used to store both nanoseconds and timezone information. In-build Python `datetime.datetime` supports microseconds at most, `numpy.datetime64` do not support timezones. Tarantool datetime interval type is planned to be stored in custom type `tarantool.Interval` and we'll need a way to support arithmetic between datetime and interval. This is the main reason we use custom class instead of plain `pandas.Timestamp`. It is also hard to implement Tarantool-compatible timezones with full conversion support without custom classes. This patch does not yet introduce the support of timezones in datetime. 1. tarantool/tarantool#5941 2. https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html Part of #204
Support non-zero tzoffset in datetime extended type. Use `tzoffset` parameter to set up offset timezone: ``` dt = tarantool.Datetime(year=2022, month=8, day=31, hour=18, minute=7, sec=54, nsec=308543321, tzoffset=180) ``` You may use `tzoffset` property to get timezone offset of a datetime object. Offset timezone is built with pytz.FixedOffset(). pytz module is already a dependency of pandas, but this patch adds it as a requirement just in case something will change in the future. This patch doesn't yet introduce the support of named timezones (tzindex). Part of #204
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]). Use `tz` parameter to set up timezone name: ``` dt = tarantool.Datetime(year=2022, month=8, day=31, hour=18, minute=7, sec=54, nsec=308543321, tz='Europe/Moscow') ``` You may use `tz` property to get timezone name of a datetime object. pytz is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [2]. All Tarantool unique and alias timezones present in pytz.all_timezones list. Only the following abbreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1): - CET - EET - EST - GMT - HST - MST - UTC - WET pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [3-5]. Tarantool itself do not support work with ambiguous abbreviated timezones: ``` Tarantool 2.10.1-0-g482d91c66 tarantool> datetime.new({tz = 'BST'}) --- - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone' ... ``` If ambiguous timezone is specified, the exception is raised. Tarantool header timezones.h [6] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz. 1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ 2. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh 3. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz 4. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz 5. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset 6. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h Closes #204
8c92b86
to
aa7302a
Compare
msgpack: support datetime extended type
Tarantool supports datetime type since version 2.10.0 [1]. This patch introduced the support of Tarantool datetime type in msgpack decoders and encoders.
Tarantool datetime objects are decoded to
tarantool.Datetime
type.tarantool.Datetime
may be encoded to Tarantool datetime objects.tarantool.Datetime
stores data in apandas.Timestamp
object. You can createtarantool.Datetime
objects either from msgpack data or by using the same API as in Tarantool:tarantool.Datetime
exposesyear
,month
,day
,hour
,minute
,sec
,nsec
andtimestamp
properties if you need to converttarantool.Datetime
to any other kind of datetime object:pandas.Timestamp
was chosen to store data because it could be used to store both nanoseconds and timezone information. In-build Pythondatetime.datetime
supports microseconds at most,numpy.datetime64
do not support timezones.Tarantool datetime interval type is planned to be stored in custom type
tarantool.Interval
and we'll need a way to support arithmetic between datetime and interval. This is the main reason we use custom class instead of plainpandas.Timestamp
. It is also hard to implement Tarantool-compatible timezones with full conversion support without custom classes.This patch does not yet introduce the support of timezones in datetime.
Part of #204
msgpack: support tzoffset in datetime
Support non-zero tzoffset in datetime extended type.
Use
tzoffset
parameter to set up offset timezone:You may use
tzoffset
property to get timezone offset of a datetime object.Offset timezone is built with pytz.FixedOffset(). pytz module is already a dependency of pandas, but this patch adds it as a requirement just in case something will change in the future.
This patch doesn't yet introduce the support of named timezones (tzindex).
Part of #204
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]).
Use
tz
parameter to set up timezone name:You may use
tz
property to get timezone name of a datetime object.pytz is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [2]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abbreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1):
pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [3-5]. Tarantool itself do not support work with ambiguous abbreviated timezones:
If ambiguous timezone is specified, the exception is raised.
Tarantool header timezones.h [6] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz.
Closes #204