reimplment `eliminate_limit` to remove `global-state`. #4324

jackwener · 2022-11-22T10:52:15Z

Which issue does this PR close?

Close #4264.
Part of #4267

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jackwener · 2022-11-22T10:57:02Z

datafusion/optimizer/src/eliminate_limit.rs

+        // After remove global-state, we don't record the parent <skip, fetch>
+        // So, bottom don't know parent info, so can't eliminate.
+        let expected = "Limit: skip=3, fetch=1\
+        \n  Sort: test.a, fetch=4\
+        \n    Limit: skip=2, fetch=1\
+        \n      Aggregate: groupBy=[[test.a]], aggr=[[SUM(test.b)]]\
+        \n        TableScan: test";
+        assert_optimized_plan_eq_with_pushdown(&plan, expected)


Some case need to record information cross plannode.
Exist regression, but it's trivial.

Removing the global state does look good in some scenarios, but maybe we should keep it where it's necessary?

basically unnecessary.

Currently global-state is passed with parameters in a recursive function, can't only some places are reserved

In fact, it basically does not affect, and other optimizers basically do not consider these corner case.

I think it would make it easier to see the changes in this PR if the order of the tests was the same (not sure if you reordered the tests on purpose or if that is something related to how github is displaying them)

alamb

Thanks @jackwener for the PR and @waynexia for the review

Sorry for the delay in review (the backlog is substantial these days!)

The code looks great to me, but I don't understand some of the test changes. Can you point out what tests were changed and why?

datafusion/optimizer/src/eliminate_limit.rs

alamb · 2022-11-27T11:44:08Z

datafusion/optimizer/src/eliminate_limit.rs

+        // After remove global-state, we don't record the parent <skip, fetch>
+        // So, bottom don't know parent info, so can't eliminate.
+        let expected = "Limit: skip=3, fetch=1\
+        \n  Sort: test.a, fetch=4\
+        \n    Limit: skip=2, fetch=1\
+        \n      Aggregate: groupBy=[[test.a]], aggr=[[SUM(test.b)]]\
+        \n        TableScan: test";
+        assert_optimized_plan_eq_with_pushdown(&plan, expected)


I think it would make it easier to see the changes in this PR if the order of the tests was the same (not sure if you reordered the tests on purpose or if that is something related to how github is displaying them)

jackwener · 2022-11-27T12:43:49Z

Sorry for change UT. It indeed make review hard. I has restored them.

original

I delete multi_limit_offset_sort_eliminate, because I think difference between it and limit_fetch_with_ancestor_limit_fetch is very small.
reorder is because : I put the more comprehensive tests later

alamb

I reviewed the plan changes in this PR carefully and they looked good to me

Thank you @jackwener

alamb · 2022-11-29T15:22:11Z

datafusion/optimizer/src/eliminate_limit.rs

-            \n    EmptyRelation";
-        assert_optimized_plan_eq(&plan, expected);
+        \n  Sort: test.a, fetch=3\
+        \n    Limit: skip=0, fetch=2\


It took me a while to convince myself this test change was correct

The original plan fetches 2 rows after the aggregate, so the output will only by 2 rows and then does a skip 2, fetch 1 after wards -- so actually this plan will always return no rows but the plan here looks fine and the original actually looks wrong

ursabot · 2022-11-29T15:32:27Z

Benchmark runs are scheduled for baseline = dd3f72a and contender = 02da32e. 02da32e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the optimizer Optimizer rules label Nov 22, 2022

jackwener changed the title ~~reimplment eliminate_limit to remove global-state.~~ reimplment eliminate_limit to remove global-state. Nov 22, 2022

jackwener commented Nov 22, 2022

View reviewed changes

jackwener mentioned this pull request Nov 26, 2022

fix regression for push_down_filter meet subquery-alias #4384

Closed

alamb reviewed Nov 27, 2022

View reviewed changes

reimplment eliminate_limit to remove global-state.

38088f6

jackwener force-pushed the limit-tmp branch from cbe233e to 38088f6 Compare November 27, 2022 12:39

andygrove self-requested a review November 28, 2022 15:14

alamb approved these changes Nov 29, 2022

View reviewed changes

alamb merged commit 02da32e into apache:master Nov 29, 2022

jackwener deleted the limit-tmp branch November 29, 2022 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reimplment `eliminate_limit` to remove `global-state`. #4324

reimplment `eliminate_limit` to remove `global-state`. #4324

jackwener commented Nov 22, 2022

jackwener Nov 22, 2022

waynexia Nov 22, 2022

jackwener Nov 23, 2022

alamb Nov 27, 2022

alamb left a comment •

edited

Loading

alamb Nov 27, 2022

jackwener commented Nov 27, 2022 •

edited

Loading

alamb left a comment

alamb Nov 29, 2022

ursabot commented Nov 29, 2022

reimplment eliminate_limit to remove global-state. #4324

reimplment eliminate_limit to remove global-state. #4324

Conversation

jackwener commented Nov 22, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jackwener Nov 22, 2022

Choose a reason for hiding this comment

waynexia Nov 22, 2022

Choose a reason for hiding this comment

jackwener Nov 23, 2022

Choose a reason for hiding this comment

alamb Nov 27, 2022

Choose a reason for hiding this comment

alamb left a comment • edited Loading

Choose a reason for hiding this comment

alamb Nov 27, 2022

Choose a reason for hiding this comment

jackwener commented Nov 27, 2022 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

alamb Nov 29, 2022

Choose a reason for hiding this comment

ursabot commented Nov 29, 2022

reimplment `eliminate_limit` to remove `global-state`. #4324

reimplment `eliminate_limit` to remove `global-state`. #4324

alamb left a comment •

edited

Loading

jackwener commented Nov 27, 2022 •

edited

Loading