Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse nanoseconds for intervals #4186

Merged
merged 3 commits into from
Nov 15, 2022

Conversation

Jefffrey
Copy link
Contributor

Which issue does this PR close?

Closes #3204.

Rationale for this change

Change parsing granularity of intervals from milliseconds to nanoseconds, to support passing fractions of milliseconds/seconds (down to nanoseconds) as IntervalMonthDayNano supports granularity down to nanoseconds.

What changes are included in this PR?

Accumulating with nanoseconds instead of milliseconds, changing to f64/i128 where necessary to account for shift in range of values.

Are these changes tested?

Added some unit tests, and fixed others as there is now more precision for seconds fractions.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 12, 2022
@Jefffrey Jefffrey force-pushed the 3204_micro_nano_second_parsing branch from 6acff67 to a4c0d4f Compare November 12, 2022 20:28
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @Jefffrey

datafusion/common/src/parsers.rs Outdated Show resolved Hide resolved
);
test_expression!(
"interval '0.49999 day'",
"0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
);
test_expression!(
"interval '0.49999999999 day'",
"0 years 0 mons 0 days 12 hours 0 mins 0.00 secs"
"0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! 👍

Jefffrey and others added 2 commits November 13, 2022 16:52
Copy link
Contributor

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @Jefffrey

there's a bug while displaying IntervalMonthDayNano

select interval '0.0001 second';
+----------------------------------------------------+
| IntervalMonthDayNano("100000")                     |
+----------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.100000 secs |
+----------------------------------------------------+
1 row in set. Query took 0.002 seconds.

i checked the underline value it's correct (month: 0, day: 0, nanos: 100000)
i think this is the display issue

Comment on lines +32 to +42
|month_part: f64, mut day_part: f64, mut nanos_part: f64| -> (i64, i64, f64) {
// Convert fractional month to days, It's not supported by Arrow types, but anyway
day_part += (month_part - (month_part as i32) as f32) * 30_f32;
day_part += (month_part - (month_part as i64) as f64) * 30_f64;

// Convert fractional days to hours
milles_part += (day_part - ((day_part as i32) as f32))
* 24_f32
nanos_part += (day_part - ((day_part as i64) as f64))
* 24_f64
* SECONDS_PER_HOUR
* MILLIS_PER_SECOND;
* NANOS_PER_SECOND;

(month_part as i32, day_part as i32, milles_part)
(month_part as i64, day_part as i64, nanos_part)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Jefffrey
Copy link
Contributor Author

thank you @Jefffrey

there's a bug while displaying IntervalMonthDayNano

select interval '0.0001 second';
+----------------------------------------------------+
| IntervalMonthDayNano("100000")                     |
+----------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.100000 secs |
+----------------------------------------------------+
1 row in set. Query took 0.002 seconds.

i checked the underline value it's correct (month: 0, day: 0, nanos: 100000) i think this is the display issue

Issue with arrow-rs, has already been addressed: apache/arrow-rs#3093

@waitingkuo
Copy link
Contributor

while we have leading zeros after decimal, it displays incorrectly

select interval '0.01 second';
+------------------------------------------------+
| IntervalDayTime("10")                          |
+------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.10 secs |
+------------------------------------------------+
1 row in set. Query took 0.000 seconds.

❯ select interval '0.001 second';
+------------------------------------------------+
| IntervalDayTime("1")                           |
+------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.01 secs |
+------------------------------------------------+
1 row in set. Query took 0.000 seconds.

);
test_expression!(
"interval '0.49999 day'",
"0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
);
test_expression!(
"interval '0.49999999999 day'",
"0 years 0 mons 0 days 12 hours 0 mins 0.00 secs"
"0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
);
test_expression!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can add a test case for this (which will fail in the current version) and wait for the update to arrow 27 #4199

Suggested change
test_expression!(
test_expression!(
"interval '0.00000000001 day'",
"0 years 0 mons 0 days 0 hours 0 mins 0.000000864 secs"
);
test_expression!(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just found that the arrow-rs fix isn't included in 27.0.0
@tustvold do you have any suggestion for this? should we just merge this first and add some test cases after arrow 28.0.0 merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add test cases that are marked as #[should_fail]

@alamb
Copy link
Contributor

alamb commented Nov 15, 2022

Thanks @Jefffrey for the fix and @waitingkuo and @waynexia for the reviews!

@alamb alamb merged commit 8bfc0ea into apache:master Nov 15, 2022
@ursabot
Copy link

ursabot commented Nov 15, 2022

Benchmark runs are scheduled for baseline = d91ce06 and contender = 8bfc0ea. 8bfc0ea is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@Jefffrey Jefffrey deleted the 3204_micro_nano_second_parsing branch November 15, 2022 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interval Literal doesn't work for timeunit less than millisecond
5 participants