Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Datafusion First PR
Which issue does this PR close?
We Implement Window Function Calls of Postgres and improve the situation on #360.
We started to contribute to Datafusion just now. Since we are creating a PR for the project for the first time, we would like to get your ideas to close this issue completely with a partial implementation for now. You can see which cases we cover in integration tests.
Rationale for this change
Datafusion did not support window call functions, it is on the roadmap.
What changes are included in this PR?
For now, we implemented
ROWS
andRANGE
modes supportingPRECEDING
andFOLLOWING
.As a draft, we currently do not support
GROUPS
modeRANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING
EXCLUDE CURRENT ROW
Next steps
GROUPS
mode implementation by extendingcalculate_current_window
method.calculate_current_window
method.Observations
Since some aggregation functions only use
f64
, there are numerical problems with statistical aggregation functions likeCORR(x,y)
, and they can be enhanced to support other DataTypes similar toSUM(x)
aggregation.Also,
evaluation()
of theCovarianceAccumulator
should be/
to become the same as PostgreSQL. However, these issues are separate from this PR. We did not use
CORR(x,y)
because of these problems.Since the sorting is unstable, some queries output different results than PostgreSQL. We use only unique columns for
ORDER BY
clauses while testingROWS
mode.An example query:
Outputs in Datafusion as
and in PostgreSQL as
without a problem, however, it produces