Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support coercing utf8 to interval and timestamp (including arguments to date_bin) #5117

Merged
merged 3 commits into from
Feb 2, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 30, 2023

Draft as it

Which issue does this PR close?

Closes #4853

Rationale for this change

Rationale:

date_bin with a string value does not work and gives a hard to understand message:

select date_bin('1 hour', column1, TIMESTAMP '2001-01-01 00:00:00Z')
from (values
  (timestamp '2022-01-01 00:00:00'),
  (timestamp '2022-01-01 01:00:00'),
(timestamp '2022-01-02 00:00:00')
) as sq;
Plan("Coercion from [Utf8, Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)] to the signature Exact([Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)]) failed.")

What changes are included in this PR?

  1. Add coercion so that Utf8 constants can be automatically coerced into Intervals (same as today)
  2. Add coercion so that Utf8 can be automatically coerced into Timestamps
  3. Tests

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, strings are now automatically coerced into Timetamps and Intervals

Note this behavior is consistent with Postgres:

postgres=# SELECT DATE_BIN(INTERVAL '15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTAMP '1970-01-01T00:00:00Z') AS res;
         res
---------------------
 2022-08-03 14:30:00
(1 row)

postgres=# SELECT DATE_BIN('15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTAMP '1970-01-01T00:00:00Z') AS res;
         res
---------------------
 2022-08-03 14:30:00
(1 row)

postgres=# SELECT DATE_BIN('15 minutes', '2022-08-03 14:38:50Z', '1970-01-01T00:00:00Z') AS res;
          res
------------------------
 2022-08-03 14:30:00+00
(1 row)

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 30, 2023
fn cast_expr(expr: &Expr, to_type: &DataType, schema: &DFSchema) -> Result<Expr> {
// Special case until Interval coercion is handled in arrow-rs
// https://github.com/apache/arrow-rs/issues/3643
match (expr, to_type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only supports literals at the moment -- tracked apache/arrow-rs#3643 for the more general case of coercing columns

@alamb alamb marked this pull request as ready for review February 1, 2023 15:01
@alamb alamb changed the title coerce utf8 to interval and timestamp (including arguments to date_bin) coerce utf8 to interval and timestamp (including arguments to date_bin) Feb 1, 2023
@alamb
Copy link
Contributor Author

alamb commented Feb 1, 2023

cc @waitingkuo and @comphead

@@ -76,6 +76,18 @@ SELECT DATE_BIN(INTERVAL '15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTA
----
2022-08-03T14:30:00

# Can coerce string interval arguments
query T
SELECT DATE_BIN('15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTAMP '1970-01-01T00:00:00Z') AS res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: if AS res is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right -- res is not needed. I will remove them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7a7fd4b

@comphead
Copy link
Contributor

comphead commented Feb 1, 2023

LGTM, not permitted to approve the PR :)

@alamb alamb changed the title coerce utf8 to interval and timestamp (including arguments to date_bin) Support coercing utf8 to interval and timestamp (including arguments to date_bin) Feb 2, 2023
@alamb alamb merged commit 224c682 into apache:master Feb 2, 2023
@alamb alamb deleted the alamb/coerce_date_bin_args branch February 2, 2023 18:23
@ursabot
Copy link

ursabot commented Feb 2, 2023

Benchmark runs are scheduled for baseline = 7d2d51b and contender = 224c682. 224c682 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interval coercion:date_bin('1 hour',...) does not work but date_bin(interval '1 hour', ... does
3 participants