-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Proposal for Date/Time enhancement #3100
Comments
Thanks @waitingkuo -- I plan to review this proposal more carefully later today or tomorrow. cc @avantgardnerio and @andygrove and @ovr who I think have been thinking about /working on timestamp related things as well |
(I will respond individually to the parts of the proposal) Timestamp Proposal
I think this sounds like a great idea 👍 |
Date Proposal
I agree |
Time Proposal
I agree -- until there is a compelling reason for time (rather than timestamp) it seems reasonable to postpone additional work on this |
Interval Proposal
Here is the spec in case that is interesting: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L354-L372 I think this is a good idea -- Interval(MonthDayNano) is a relatively new addition to the arrow standard, so I think DataFusion might default to
Agree
I don't quite understand the question about "really need timestamp with time zone - timestamp" -- as I thought your proposal for timezones was to fill out the feature set for timestamp with timezone 🤔 |
i originally thought that comparing |
@waitingkuo what do you think about making a list on this ticket of all the various timestamp / time related open items? I think there are a non trivial number of them, such as #194, #3103 and several others? Alternately, we can make another ticket that collects the subtasks |
There is no reason, it just isn't implemented. I implemented add for Date32 & Date64, but not Timestamp yet. |
I believe a Date has the resolution of 24 hours, so adding 1 or 23 hours should have no effect? |
I'm wary of this because it changes |
I personally lean towards only supporting one, especially if Chrono only supports one. https://xkcd.com/1179/ https://xkcd.com/927/ |
@waitingkuo I think 95% of what you propose is very sensible. The other 5% I don't think is incorrect, but does raise cause for consideration. Some random thoughts:
|
@alamb |
this issue is fixed, it works now |
agree, the current one is the most widely used |
I think this is best practice for all new OLTP and OLAP systems (always store data as UTC and then render into user locale as desired). I think what makes it tougher in OLAP system is that the data being processed is often created outside of Datafusion (aka in parquet or CSV files from some other system) which can and unforuntaely do write dates / times / timestamps other than UTC In general, I would be a big fan of trying if possible of having datafusion normlize all data on input to UTC prior to processing but I worry this might not be a good idea for performance reasons. |
Postgrseql deal with this in the similar way https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
e.g. willy=# create table test02 (c1 timestamptz);
CREATE TABLE
willy=# insert into test02 (c1) values ('2000-01-01T00:00:00+08:00');
INSERT 0 1
willy=# insert into test02 (c1) values ('2000-01-01T00:00:00+07:00');
INSERT 0 1
willy=# insert into test02 (c1) values ('2000-01-01T23:00:00+07:00');
INSERT 0 1
willy=# select * from test02;
c1
------------------------
2000-01-01 00:00:00+08
2000-01-01 01:00:00+08
2000-01-02 00:00:00+08
(3 rows) But it does use local time while we try to extract the day
|
When we are done with the discussion, perhaps we can close this issue and use #3148 as the coordination point for development |
!!! Please correct me if i'm wrong !!!
Intro
Design Principle
Let's Begin with Postgresql's Date/Time
Let's Start to Compare
Timestamp
Postgresql
timestamp
andtimestamp with time zone
. (note that time zone is included in these 8 bytes)1970-01-01T00:00:00
timestamp(0)
totimestamp(6)
timestamp(0)
rounds to secondstimestamp(3)
rounds to millisecondstimestamp(6)
rounds to microsecondstimestamp 'xxx'
outputtimestamp
xxx
does't contain time zone info, it just works as what you thinkwilly=# select timestamp '2000-01-01T00:00:00'; timestamp --------------------- 2000-01-01 00:00:00 (1 row)
xxx
contains time zone info, time zone is just ignored. (i believe that this is a surprise for some people) e.g.willy=# select timestamp '2000-01-01T00:00:00+08:00'; timestamp --------------------- 2000-01-01 00:00:00 (1 row)
timestamp with time zone 'xxx'
outputtimestamp with time
1 if
xxx
contains no time zone, it assume it's local timewilly=# select timestamp with time zone '2000-01-01T00:00:00'; timestamptz ------------------------ 2000-01-01 00:00:00+08 (1 row)
2 if
xxx
contains time zone, it'll be converted to your local time zonewilly=# select timestamp with time zone '2000-01-01T00:00:00+02:00'; timestamptz ------------------------ 2000-01-01 06:00:00+08 (1 row)
Datafusion
Timestamp(TimeUnit, Option<String>)
TimeUnit::Second
TimeUnit::MilliSecond
TImeUnit::MicroSecond
TimeUnit::NanoSecond
1970-01-01T00:00:00
Timestamp(TimeUnit::NanoSecond, None)
timestamp
literal but notimestamp with time zone
timestamp xxx
outputsTimestamp(TimeUnit::NanoSecond, None)
xxx
contains no time zone, it automatically applies local time, parse it, convert it to utc time zone, and then drop the time zone ShouldCast(UTF-8 AS Timestamp)
apply local time zone? #3080xxx
contains time zone, it's parsed correctly, then converted to utc time zone, and then drop the time zoneProposal
timestamp xxx
work like postgresql doestimestamp with time zone
, i believe there're lots of works and discussions to do: Cast Kernel Ignores Timezone arrow-rs#1936 Handle the timezone in extract week in temporal.rs arrow-rs#1380 Can not create a TimestampNanosecondArray that has a specified timezone arrow-rs#597set time zone to xxx
to change the local time zoneDate
postgresql image
Posgresql
Datafusion
1999-01-08
Proposal
Date
19990108
and990108
) Chrono strictly follows ISO 8601. I think supporting all 8601 date formats makes sense.Time
Postgresql
time xxx
outputtime
that requires 8 bytestime xxx with time zone
that requires 12 bytes, I have no idea why we need 4 more bytes here sincetimestamp with time zone
only requires 8 bytesDatafusion
time
literal for now, let's wait for feat: Add support for TIME literal values #3010Proposal
time with time zone
. I have no clue when we need it.arrow-rs
'stime
datatype contains no timezone. Perhaps we need not to implement this.Interval
Postgresql
Datafusion
reference: https://github.com/apache/arrow-rs/blob/master/arrow/src/datatypes/datatype.rs#L237
Interval(IntervalUnit)
IntervalUnit::YearMonth
IntervalUnit::DayTime
IntervalUnit::MonthDayNano
interval xxx
outputInterval(DayTime)
interval xxx
support floating number secondsinterval(DayTime)
toTimestamp(NanoSecond, None)
, perhaps the reason here is the difference of resolutioninterval(DayTime)
toDate
- it breaks while we have hour (or other smaller units) interval #3093this is solvedTime
now, let's wait for feat: Add support for TIME literal values #3010Proposal
INTERVAL xxx
outputs Interval(MonthDayNano) instead of Interval(DayTime) as it's easier to align with ourTimestamp(NanoSecond, None)
we could think about whether we really needWhile comparingtimestamp with time zone - timestamp
...Timestamp(TimeUnit, TimeZone)
Timestamp(TimeUnit, None)
. the one with time zone will be converted to utc and drop the timezone (it simply drop the timezone internally).this is what postgresql hasThe text was updated successfully, but these errors were encountered: