-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datafusion] NOW() function support #288
Conversation
…re all now() return same value
Codecov Report
@@ Coverage Diff @@
## master #288 +/- ##
==========================================
- Coverage 76.80% 76.19% -0.62%
==========================================
Files 133 142 +9
Lines 23284 23927 +643
==========================================
+ Hits 17884 18231 +347
- Misses 5400 5696 +296
Continue to review full report at Codecov.
|
This PR is already big. To reduce the review burden, Will add CURRENT_TIMESTAMP and CURRENT_TIME in a separate PR. |
Thanks for this PR! I believe we should do it a bit differently and not calculate the timestamp inside the optimization rule. One could disable the optimization rule, or execute an un-optimized plan. My suggestion would be to set field to the For the optimizer you could use |
Thanks @Dandandan That makes sense. I'll modify the PR as per your suggestion 👍 |
We'll probably hit up against similar issues for current_date, current_time, current_timestamp, etc, in case that helps formulate what said node would look like. Interestingly, this is an instance where it would be much easier if slightly less performant to implement as a UDF for certain use cases, because you could just close over some external state when building your execution ctx. |
AFAIK It seems to me that the use-case here is that we want to preserve state across nodes, so that their execution depends on said state. IMO a natural construct here is something like For Ballista, we would probably need IMO this would enable us to implement The IMO a |
@jorgecarleitao that all sounds reasonable! In Postgres, this sort of corresponds to the function volatility categories (https://www.postgresql.org/docs/13/xfunc-volatility.html) which might be a useful basis for any future definition of different function types.
|
The stateful function seems needed at some point. Adding a node for node for timestamps (which can be used by current_date, NOW(), etc.) which only has access to the query start time seems like a more natural route to me for now. WDYT? |
I agree with @Dandandan -- I would like to get a basic implementation of now(). I feel @msathis had this PR really close but then the addition of
Just to be clear, are you suggesting adding a new Expr variant, something like I guess I was hoping that part of the translation from Expr --> PhysicalExpr could somehow capture state that was available at plan time (e.g. |
I think we should create another category of functions called |
I am not opposed to the idea of Here is an alternate proposal (in #335): 261e769 -- basically use a closure to capture the value for now() during plan time. It doesn't require changing any other function signatures and I think implements the semantics of What do you think @msathis @jorgecarleitao @Dandandan and @returnString ? If we still want to do |
Yeah on the basis that
Especially at the logical level, I think there are more potential benefits here than just supporting stateful functions outright, like enabling better plan optimisation, and we could probably do with some upfront design work on that first 😄 Edit: also, if/when we land some work for stateful functions, there's nothing that stops us migrating this across later! |
Just to understand, @alamb and @returnString , is it fair to say that there is some urgency in the NOW? In this case, I agree that the closure is a great stop-gap and we should go for it. Other than that, if you want to take a stab at stateful functions, @msathis , an idea is to create an issue and a design document / draft, so that others could comment on before you commit to a large chunk of work? |
@jorgecarleitao I would like
However, that query also requires timestamp arithmetic -- part of #194 -- but I can work around the lack of #194 with casting. I can't think of any way to work around the lack of I also am not sure I would call |
@jorgecarleitao Also, I am not convinced about how valuable a general purpose This is due to:
|
Yeah, in Postgres parlance this kind of function is "stable" (read: consistent over the course of a single txn given the same inputs) as opposed to "immutable" (which can't reference anything outside its args/constants) - still not the best breakdown imo, but a little bit closer maybe. Cache invalidation and naming things, right? 😅
Off the top of my head I think it'll open up some potential for generalised optimisation passes over function usage in queries according to function class, i.e. the optimiser rule used for the initial implementation of this PR but applicable to arbitrary functions provided they indicate themselves to be "stable".
Personally I think so, it's a pretty generally useful function and opens up lots of good time-series use cases as @alamb mentioned. Edit: also, I'm happy to assist in defining this proposed future work or even drive it outright; not just trying to generate tasks for other people ;) |
I have reverted the last commit & added @alamb approach from #335. Took care of the review comments as well. @alamb @jorgecarleitao Can you please give another look at the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @msathis for this and for your patience 💯
Thanks @jorgecarleitao! It was a great learning experience for me. 👌 |
Yes, amazing work @msathis |
I filed #340 to track possible future work on stateful functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I LIKE IT! Nice work @msathis -- and I echo the sentiment of thanks for bearing with us.
Which issue does this PR close?
Closes #251 .
Rationale for this change
Adding Postgres compatible NOW function
What changes are included in this PR?
Are there any user-facing changes?
N/A