Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reimplment eliminate_limit to remove global-state. #4324

Merged
merged 1 commit into from
Nov 29, 2022

Conversation

jackwener
Copy link
Member

Which issue does this PR close?

Close #4264.
Part of #4267

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Nov 22, 2022
@jackwener jackwener changed the title reimplment eliminate_limit to remove global-state. reimplment eliminate_limit to remove global-state. Nov 22, 2022
Comment on lines 218 to 225
// After remove global-state, we don't record the parent <skip, fetch>
// So, bottom don't know parent info, so can't eliminate.
let expected = "Limit: skip=3, fetch=1\
\n Sort: test.a, fetch=4\
\n Limit: skip=2, fetch=1\
\n Aggregate: groupBy=[[test.a]], aggr=[[SUM(test.b)]]\
\n TableScan: test";
assert_optimized_plan_eq_with_pushdown(&plan, expected)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some case need to record information cross plannode.
Exist regression, but it's trivial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the global state does look good in some scenarios, but maybe we should keep it where it's necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically unnecessary.

  • Currently global-state is passed with parameters in a recursive function, can't only some places are reserved
  • In fact, it basically does not affect, and other optimizers basically do not consider these corner case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make it easier to see the changes in this PR if the order of the tests was the same (not sure if you reordered the tests on purpose or if that is something related to how github is displaying them)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jackwener for the PR and @waynexia for the review

Sorry for the delay in review (the backlog is substantial these days!)

The code looks great to me, but I don't understand some of the test changes. Can you point out what tests were changed and why?

datafusion/optimizer/src/eliminate_limit.rs Show resolved Hide resolved
Comment on lines 218 to 225
// After remove global-state, we don't record the parent <skip, fetch>
// So, bottom don't know parent info, so can't eliminate.
let expected = "Limit: skip=3, fetch=1\
\n Sort: test.a, fetch=4\
\n Limit: skip=2, fetch=1\
\n Aggregate: groupBy=[[test.a]], aggr=[[SUM(test.b)]]\
\n TableScan: test";
assert_optimized_plan_eq_with_pushdown(&plan, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make it easier to see the changes in this PR if the order of the tests was the same (not sure if you reordered the tests on purpose or if that is something related to how github is displaying them)

@jackwener
Copy link
Member Author

jackwener commented Nov 27, 2022

Sorry for change UT. It indeed make review hard. I has restored them.

original

  • I delete multi_limit_offset_sort_eliminate, because I think difference between it and limit_fetch_with_ancestor_limit_fetch is very small.
  • reorder is because : I put the more comprehensive tests later

@andygrove andygrove self-requested a review November 28, 2022 15:14
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the plan changes in this PR carefully and they looked good to me

Thank you @jackwener

\n EmptyRelation";
assert_optimized_plan_eq(&plan, expected);
\n Sort: test.a, fetch=3\
\n Limit: skip=0, fetch=2\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to convince myself this test change was correct

The original plan fetches 2 rows after the aggregate, so the output will only by 2 rows and then does a skip 2, fetch 1 after wards -- so actually this plan will always return no rows but the plan here looks fine and the original actually looks wrong

@alamb alamb merged commit 02da32e into apache:master Nov 29, 2022
@ursabot
Copy link

ursabot commented Nov 29, 2022

Benchmark runs are scheduled for baseline = dd3f72a and contender = 02da32e. 02da32e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@jackwener jackwener deleted the limit-tmp branch November 29, 2022 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reimplement eliminate_limit
4 participants