Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear search support for Window Group queries #5286

Merged
merged 22 commits into from
Feb 17, 2023
Merged

Linear search support for Window Group queries #5286

merged 22 commits into from
Feb 17, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes 5213.
Closes 5275.

Rationale for this change

This is a continuation of #4989, it extends amortized linear search support to window frames in GROUPS mode. As discussed in the preceding PR, this results in significant performance gains, which is the main purpose of this PR.

What changes are included in this PR?

This PR removes the old GROUPS mechanism and replaces it with the amortized linear search. This reduces code size and yields performance gains simultaneously.

Are there any user-facing changes?

No.

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner labels Feb 15, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes the old GROUPS mechanism and replaces it with the amortized linear search. This reduces code size and yields performance gains simultaneously.

The best kind of changes 😍

I went through this PR fairly carefully -- it all looks like great work, as always.

Thanks @mustafasrepo

Ok(window_frame)
}
let window_frame = window_frame.clone().try_into()?;
regularize(window_frame, order_by.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why regularize is called in the serialization code. It makes sense to be called as part of the SQL planner (or LogicalPlanBuilder) so that by the time the structures were serialized / deserialized they should already be valid (and if they are not it should be an error).

Copy link
Contributor

@ozankabak ozankabak Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That logic was already here from before (in addition to being elsewhere too). In this PR we refactored it out into the regularize function as a first step, so at least there is no code duplication anymore. This way, we will be able to remove it easily later on.

datafusion/proto/src/logical_plan/from_proto.rs Outdated Show resolved Hide resolved
@ozankabak
Copy link
Contributor

Argh, using GH to incorporate the suggestions resulted in a formatting error, just sent a manual commit again. Feel free to merge after CI passes.

@alamb alamb merged commit 79ecf94 into apache:main Feb 17, 2023
@ursabot
Copy link

ursabot commented Feb 17, 2023

Benchmark runs are scheduled for baseline = ded897e and contender = 79ecf94. 79ecf94 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

jiangzhx pushed a commit to jiangzhx/arrow-datafusion that referenced this pull request Feb 24, 2023
* add naive linear search

* Add last range to decrease search size

* minor changes

* add low, high arguments

* Go back to old API, improve comments, refactors

* Linear Groups implementation

* Resolve linter errors

* remove old unit tests

* simplifications

* Add unit tests

* Remove sort options from GROUPS calculations, various code simplifications
and comment clarifications

* New TODOs to fix

* Address reviews

* Fix error

* Prehandle range current row and unbounded following case

* Fix error

* Move a check from execution to planning, reduce code duplication

* Incorporate review suggestion (with cargo fmt fix)

---------

Co-authored-by: Mehmet Ozan Kabak <[email protected]>
@mustafasrepo mustafasrepo deleted the feature/range_linear_group branch March 2, 2023 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for some of the window frame range queries Add support for linear groups search
4 participants