Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of timestamps and steps of less than a day for generate_series #11822

Closed
Abdullahsab3 opened this issue Aug 5, 2024 · 10 comments · Fixed by #12400
Closed

Support of timestamps and steps of less than a day for generate_series #11822

Abdullahsab3 opened this issue Aug 5, 2024 · 10 comments · Fixed by #12400
Assignees
Labels
enhancement New feature or request

Comments

@Abdullahsab3
Copy link
Contributor

Abdullahsab3 commented Aug 5, 2024

Is your feature request related to a problem or challenge?

I would like to have support of time generation using time ranges and with steps that are less than 1 day (e.g. an hour, custom range of x seconds, etc). The current generate_series only supports the following:

generate_series(Int64)
generate_series(Int64, Int64)
generate_series(Int64, Int64, Int64)
generate_series(Date32, Date32, Interval(MonthDayNano))

This is built in postgresDB, so it might be a good inspiration source for how it can be implemented in Datafusion.

Note: There is probably a bug in the current generate_series implementation. When specifying a step less than an interval of 1 day, Datafusion hangs indefinitely. Link to bug report #11823

Describe the solution you'd like

Example:

select generate_series('2023-01-01 10:00:00'::timestamp, '2023-01-31 23:59:00'::timestamp, '1 hour'::interval) as date_time;

Describe alternatives you've considered

No response

Additional context

This feature is really crucial for time series handling. influxDB is built on top of Datafusion and needs to have this as well :)

@Abdullahsab3 Abdullahsab3 added the enhancement New feature or request label Aug 5, 2024
@Abdullahsab3
Copy link
Contributor Author

I opened #11823 for the bug as well

@alamb
Copy link
Contributor

alamb commented Aug 5, 2024

Thank you @Abdullahsab3

@Omega359
Copy link
Contributor

Omega359 commented Sep 2, 2024

I'm curious what the smallest interval that these udf's should support? Seconds? Millis? Nanos? Should calls like the following be supported

select generate_series('2021-01-01'::date, '2021-01-02'::date INTERVAL '1' HOUR);

or should intervals less than a day still be disallowed if arg[0] is not of a timestamp type?

@Abdullahsab3
Copy link
Contributor Author

Sky is the limit :-) I think postgres supports all kinds of intervals (except nanos since those are not supported by postgresql), but they handle generate_series differently I think. Instead of using date, they use timestamp , so '2021-01-01'::date gets casted to 2021-01-01 00:00:00.000000 +00:00. Personally I think that makes more sense, since timestamp is more general than date. If you want date back from generate_series, the timestamp can be casted to a date

@Omega359
Copy link
Contributor

Omega359 commented Sep 3, 2024

Sky is the limit :-) I think postgres supports all kinds of intervals (except nanos since those are not supported by postgresql), but they handle generate_series differently I think. Instead of using date, they use timestamp , so '2021-01-01'::date gets casted to 2021-01-01 00:00:00.000000 +00:00. Personally I think that makes more sense, since timestamp is more general than date. If you want date back from generate_series, the timestamp can be casted to a date

I thought about that as well though my main concern with that would that it would be an api change for these UDF's since the output type would change unexpectedly between releases. It would make the implementation easier though

@Omega359
Copy link
Contributor

Omega359 commented Sep 8, 2024

take

@Omega359
Copy link
Contributor

Omega359 commented Sep 9, 2024

So I've implemented this in a local branch with all dates being converted to timestamps which may/will break any existing query that expects a date32 out but gets a timestamp instead. I'm not terribly comfortable with that so I think I may change the proposed implementation as follows:

  1. Number range/generate_series stays identical to what exists today.
  2. If the start is anything but a timestamp it follows the existing behaviour today. That includes limiting the interval to no less than a day.
  3. If the start is a timestamp intervals down to a nanosecond are supported, and timestamps with time zones are also supported. What I don't think will be supported will checks on ranges to verify that a range covering a discontinuity in the times (say when crossing a daylight savings change) is handled in a manner that is 100% correct. That behaviour would be no different than what exists in the code today.

@alamb @Abdullahsab3

@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

So I've implemented this in a local branch with all dates being converted to timestamps which may/will break any existing query that expects a date32 out but gets a timestamp instead. I'm not terribly comfortable with that so I think I may change the proposed implementation as follows:

  1. Number range/generate_series stays identical to what exists today.
  2. If the start is anything but a timestamp it follows the existing behaviour today. That includes limiting the interval to no less than a day.
  3. If the start is a timestamp intervals down to a nanosecond are supported, and timestamps with time zones are also supported. What I don't think will be supported will checks on ranges to verify that a range covering a discontinuity in the times (say when crossing a daylight savings change) is handled in a manner that is 100% correct.

This makes sense to me @Omega359 -- thank you .

That behaviour would be no different than what exists in the code today.

I think it makes sense to start with getting the basics down, and maybe documenting that there are potential inconsistencies in certain cases. Then that behavior can be more easily improved by anyone who needs it

@alamb
Copy link
Contributor

alamb commented Sep 10, 2024

#12400 -- thre is a PR up for this feature

@Abdullahsab3
Copy link
Contributor Author

Thank you @Omega359 and @alamb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants