Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support timestamp and interval arithmetic #5764

Merged
merged 41 commits into from
Mar 30, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
1869363
first implementation and tests of timestamp subtraction
berkaysynnada Mar 7, 2023
2f01278
improvement after review
berkaysynnada Mar 7, 2023
806b4d3
postgre interval format option
berkaysynnada Mar 8, 2023
708d717
random tests extended
berkaysynnada Mar 8, 2023
c5bacbe
corrections after review
berkaysynnada Mar 8, 2023
011933f
operator check
berkaysynnada Mar 8, 2023
e475f58
flag is removed
berkaysynnada Mar 9, 2023
423fb65
clippy fix
berkaysynnada Mar 9, 2023
1291758
toml conflict
berkaysynnada Mar 9, 2023
055ed81
Merge branch 'main' into feature/time-interval-support
berkaysynnada Mar 9, 2023
d7f3696
minor changes
berkaysynnada Mar 9, 2023
8d5c8e3
deterministic matches
berkaysynnada Mar 11, 2023
31577d9
simplifications (clippy error)
berkaysynnada Mar 12, 2023
c274aef
test format changed
berkaysynnada Mar 13, 2023
968a682
minor test fix
berkaysynnada Mar 13, 2023
49506ed
Merge branch 'main' into feature/time-interval-support
berkaysynnada Mar 13, 2023
ed63779
Update scalar.rs
berkaysynnada Mar 13, 2023
68ea647
Refactoring and simplifications
ozankabak Mar 13, 2023
ed04466
Make ScalarValue support interval comparison
ozankabak Mar 14, 2023
3bf8fd6
naming tests
berkaysynnada Mar 14, 2023
0f8a7a7
macro renaming
berkaysynnada Mar 14, 2023
cf892fe
renaming macro
berkaysynnada Mar 14, 2023
6b5484e
Merge branch 'apache:main' into feature/timestamp-interval-arith-query
berkaysynnada Mar 15, 2023
a078dbb
ok till arrow kernel ops
berkaysynnada Mar 20, 2023
1c8fd69
Merge branch 'main' into feature/timestamp-interval-arith-query
berkaysynnada Mar 20, 2023
f27bdb7
Merge branch 'apache:main' into feature/timestamp-interval-arith-query
berkaysynnada Mar 20, 2023
49727e1
Merge branch 'apache:main' into feature/timestamp-interval-arith-query
berkaysynnada Mar 22, 2023
bbfd9b1
macro will replace matches inside evaluate
berkaysynnada Mar 22, 2023
e14a16f
Code refactor
metesynnada Mar 24, 2023
9f82bbb
retract changes in scalar and datetime
mustafasrepo Mar 24, 2023
25d76f3
ts op interval with chrono functions
berkaysynnada Mar 24, 2023
9de7875
bug fix and refactor
berkaysynnada Mar 26, 2023
d637efe
test refactor
berkaysynnada Mar 27, 2023
e2ee0ed
Enhance commenting
metesynnada Mar 27, 2023
3e03a54
new binary operation logic, handling the inside errors
metesynnada Mar 28, 2023
03d3aed
slt and minor changes
berkaysynnada Mar 28, 2023
20b276a
tz parsing excluded
berkaysynnada Mar 28, 2023
ef1c194
replace try_binary and as_datetime, and keep timezone for ts+interval op
berkaysynnada Mar 30, 2023
f1e78f2
Merge branch 'main' into feature/timestamp-interval-arith-query
berkaysynnada Mar 30, 2023
21e1df8
fix after merge
berkaysynnada Mar 30, 2023
b20eb77
delete unused functions
berkaysynnada Mar 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 102 additions & 26 deletions datafusion/common/src/scalar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -975,20 +975,20 @@ pub const NANOSECOND_MODE: bool = true;
/// interval will have the type [`IntervalDayTimeType`].
/// - When subtracting timestamps at microseconds/nanoseconds precision, the
/// output interval will have the type [`IntervalMonthDayNanoType`].
fn ts_sub_to_interval<const TIME_MOD: bool>(
fn ts_sub_to_interval<const TIME_MODE: bool>(
lhs_ts: i64,
rhs_ts: i64,
lhs_tz: &Option<String>,
rhs_tz: &Option<String>,
) -> Result<ScalarValue> {
let (parsed_lhs_tz, parsed_rhs_tz) =
(parse_timezones(lhs_tz), parse_timezones(rhs_tz));
let parsed_lhs_tz = parse_timezones(lhs_tz)?;
let parsed_rhs_tz = parse_timezones(rhs_tz)?;

let lhs_dt = with_timezone_to_naive_datetime::<TIME_MOD>(lhs_ts, &parsed_lhs_tz)?;
let rhs_dt = with_timezone_to_naive_datetime::<TIME_MOD>(rhs_ts, &parsed_rhs_tz)?;
let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
let (naive_lhs, naive_rhs) =
calculate_naives::<TIME_MODE>(lhs_ts, parsed_lhs_tz, rhs_ts, parsed_rhs_tz)?;
let delta_secs = naive_lhs.signed_duration_since(naive_rhs);

match TIME_MOD {
match TIME_MODE {
MILLISECOND_MODE => {
let as_millisecs = delta_secs.num_milliseconds();
Ok(ScalarValue::new_interval_dt(
Expand All @@ -1011,18 +1011,82 @@ fn ts_sub_to_interval<const TIME_MOD: bool>(
}
}

// This function parses the timezone from string to Tz.
// If it cannot parse or timezone field is [`None`], it returns [`None`].
pub fn parse_timezones(tz: &Option<String>) -> Option<Tz> {
/// This function parses the timezone from string to Tz.
/// If it cannot parse or timezone field is [`None`], it returns [`None`].
pub fn parse_timezones(tz: &Option<String>) -> Result<Option<Tz>> {
if let Some(tz) = tz {
let parsed_tz: Option<Tz> = FromStr::from_str(tz)
.map_err(|_| {
DataFusionError::Execution("cannot parse given timezone".to_string())
})
.ok();
parsed_tz
let parsed_tz: Tz = FromStr::from_str(tz).map_err(|_| {
DataFusionError::Execution("cannot parse given timezone".to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the error contained the problematic timezone. Something like

        let parsed_tz: Tz = FromStr::from_str(tz).map_err(|e| {
            DataFusionError::Execution(format!("cannot parse '{tz}' as timezone: {e}".to_string())

})?;
Ok(Some(parsed_tz))
} else {
None
Ok(None)
}
}

/// This function takes two timestamps with an optional timezone,
/// and returns the duration between them. If one of the timestamps
/// has a [`None`] timezone, the other one is also treated as having [`None`].
pub fn calculate_naives<const TIME_MODE: bool>(
lhs_ts: i64,
parsed_lhs_tz: Option<Tz>,
rhs_ts: i64,
parsed_rhs_tz: Option<Tz>,
) -> Result<(NaiveDateTime, NaiveDateTime)> {
let err = || {
DataFusionError::Execution(String::from(
"error while converting Int64 to DateTime in timestamp subtraction",
))
};
match (parsed_lhs_tz, parsed_rhs_tz, TIME_MODE) {
(Some(lhs_tz), Some(rhs_tz), MILLISECOND_MODE) => {
let lhs = arrow_array::temporal_conversions::as_datetime_with_timezone::<
arrow_array::types::TimestampMillisecondType,
>(lhs_ts, rhs_tz)
.ok_or_else(err)?
.naive_local();
let rhs = arrow_array::temporal_conversions::as_datetime_with_timezone::<
arrow_array::types::TimestampMillisecondType,
>(rhs_ts, lhs_tz)
.ok_or_else(err)?
.naive_local();
Ok((lhs, rhs))
}
(Some(lhs_tz), Some(rhs_tz), NANOSECOND_MODE) => {
let lhs = arrow_array::temporal_conversions::as_datetime_with_timezone::<
arrow_array::types::TimestampNanosecondType,
>(lhs_ts, rhs_tz)
.ok_or_else(err)?
.naive_local();
let rhs = arrow_array::temporal_conversions::as_datetime_with_timezone::<
arrow_array::types::TimestampNanosecondType,
>(rhs_ts, lhs_tz)
.ok_or_else(err)?
.naive_local();
Ok((lhs, rhs))
}
(_, _, MILLISECOND_MODE) => {
let lhs = arrow_array::temporal_conversions::as_datetime::<
arrow_array::types::TimestampMillisecondType,
>(lhs_ts)
.ok_or_else(err)?;
let rhs = arrow_array::temporal_conversions::as_datetime::<
arrow_array::types::TimestampMillisecondType,
>(rhs_ts)
.ok_or_else(err)?;
Ok((lhs, rhs))
}
(_, _, NANOSECOND_MODE) => {
let lhs = arrow_array::temporal_conversions::as_datetime::<
arrow_array::types::TimestampNanosecondType,
>(lhs_ts)
.ok_or_else(err)?;
let rhs = arrow_array::temporal_conversions::as_datetime::<
arrow_array::types::TimestampNanosecondType,
>(rhs_ts)
.ok_or_else(err)?;
Ok((lhs, rhs))
}
}
}

Expand Down Expand Up @@ -1112,9 +1176,13 @@ pub fn milliseconds_add_array<const INTERVAL_MODE: i8>(
interval: i128,
sign: i32,
) -> Result<i64> {
let secs = ts_ms / 1000;
let nsecs = ((ts_ms % 1000) * 1_000_000) as u32;
do_date_time_math_array::<INTERVAL_MODE>(secs, nsecs, interval, sign)
let mut secs = ts_ms / 1000;
let mut nsecs = ((ts_ms % 1000) * 1_000_000) as i32;
if nsecs < 0 {
secs -= 1;
nsecs += 1_000_000_000;
}
do_date_time_math_array::<INTERVAL_MODE>(secs, nsecs as u32, interval, sign)
.map(|dt| dt.timestamp_millis())
}

Expand All @@ -1131,9 +1199,13 @@ pub fn microseconds_add_array<const INTERVAL_MODE: i8>(
interval: i128,
sign: i32,
) -> Result<i64> {
let secs = ts_us / 1_000_000;
let nsecs = ((ts_us % 1_000_000) * 1000) as u32;
do_date_time_math_array::<INTERVAL_MODE>(secs, nsecs, interval, sign)
let mut secs = ts_us / 1_000_000;
let mut nsecs = ((ts_us % 1_000_000) * 1000) as i32;
if nsecs < 0 {
secs -= 1;
nsecs += 1_000_000_000;
}
do_date_time_math_array::<INTERVAL_MODE>(secs, nsecs as u32, interval, sign)
.map(|dt| dt.timestamp_nanos() / 1000)
}

Expand All @@ -1150,9 +1222,13 @@ pub fn nanoseconds_add_array<const INTERVAL_MODE: i8>(
interval: i128,
sign: i32,
) -> Result<i64> {
let secs = ts_ns / 1_000_000_000;
let nsecs = (ts_ns % 1_000_000_000) as u32;
do_date_time_math_array::<INTERVAL_MODE>(secs, nsecs, interval, sign)
let mut secs = ts_ns / 1_000_000_000;
let mut nsecs = (ts_ns % 1_000_000_000) as i32;
if nsecs < 0 {
secs -= 1;
nsecs += 1_000_000_000;
}
do_date_time_math_array::<INTERVAL_MODE>(secs, nsecs as u32, interval, sign)
.map(|dt| dt.timestamp_nanos())
}

Expand Down
21 changes: 9 additions & 12 deletions datafusion/core/tests/sql/set_variable.rs
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,10 @@ async fn set_time_zone_bad_time_zone_format() {
.unwrap();

let err = pretty_format_batches(&result).err().unwrap().to_string();
assert_eq!(err, "Parser error: Invalid timezone \"08:00\": only offset based timezones supported without chrono-tz feature");
assert_eq!(
err,
"Parser error: Invalid timezone \"08:00\": '08:00' is not a valid timezone"
);

plan_and_collect(&ctx, "SET TIME ZONE = '08'")
.await
Expand All @@ -440,22 +443,16 @@ async fn set_time_zone_bad_time_zone_format() {
.unwrap();

let err = pretty_format_batches(&result).err().unwrap().to_string();
assert_eq!(err, "Parser error: Invalid timezone \"08\": only offset based timezones supported without chrono-tz feature");
assert_eq!(
err,
"Parser error: Invalid timezone \"08\": '08' is not a valid timezone"
);

// we dont support named time zone yet
plan_and_collect(&ctx, "SET TIME ZONE = 'Asia/Taipei'")
.await
.unwrap();

// casting UTF-8 to TimestampTZ isn't supported yet, add Timestamp as the middle layer for now
let result =
plan_and_collect(&ctx, "SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ")
.await
.unwrap();

let err = pretty_format_batches(&result).err().unwrap().to_string();
assert_eq!(err, "Parser error: Invalid timezone \"Asia/Taipei\": only offset based timezones supported without chrono-tz feature");

// this is invalid even after we support named time zone
plan_and_collect(&ctx, "SET TIME ZONE = 'Asia/Taipei2'")
.await
Expand All @@ -467,5 +464,5 @@ async fn set_time_zone_bad_time_zone_format() {
.await
.unwrap();
let err = pretty_format_batches(&result).err().unwrap().to_string();
assert_eq!(err, "Parser error: Invalid timezone \"Asia/Taipei2\": only offset based timezones supported without chrono-tz feature");
assert_eq!(err, "Parser error: Invalid timezone \"Asia/Taipei2\": 'Asia/Taipei2' is not a valid timezone");
}
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ query error Cannot automatically convert Interval\(DayTime\) to Interval\(MonthD
---
select arrow_cast(interval '30 minutes', 'Interval(MonthDayNano)');

query error DataFusion error: Error during planning: Cannot automatically convert Utf8 to Interval\(MonthDayNano\)
query error DataFusion error: Arrow error: Cast error: Casting from Utf8 to Interval\(MonthDayNano\) not supported
select arrow_cast('30 minutes', 'Interval(MonthDayNano)');


Expand Down
1 change: 1 addition & 0 deletions datafusion/physical-expr/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ unicode_expressions = ["unicode-segmentation"]
[dependencies]
ahash = { version = "0.8", default-features = false, features = ["runtime-rng"] }
arrow = { workspace = true }
arrow-array = { version = "34.0.0", default-features = false, features = ["chrono-tz"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of datafusion now uses arrow 36, but this uses arrow 34

Suggested change
arrow-array = { version = "34.0.0", default-features = false, features = ["chrono-tz"] }
arrow-array = { workspace = true }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #5794

arrow-buffer = { workspace = true }
arrow-schema = { workspace = true }
blake2 = { version = "^0.10.2", optional = true }
Expand Down
Loading