-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement logical filter plan node #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Yuchen Liang <[email protected]>
skyzh
approved these changes
Oct 23, 2023
yliang412
added a commit
that referenced
this pull request
Mar 11, 2024
Closes #65. This PR produces metadata in optimizer output. Currently, the only metadata kept is the group id. The group id information has the following benefits: - allow the consumer of the optd plan to recover reusable fragments in the DAG-shaped query plan. - implement adaptive physical collector outside `optd-core`. - display cardinality cost: #89 Looking forward to feedbacks! In the example below, we produce the same plan as before when adaptiveness is enabled. ```sql ❯ create table t1(t1v1 int, t1v2 int); 0 rows in set. Query took 0.009 seconds. Execution took 0.000 secs, Planning took 0.000 secs ❯ explain select * from t1 as a, t1 as b; +--------------------------------------------------+-------------------------------------------------------------------------------------------------+ | plan_type | plan | +--------------------------------------------------+-------------------------------------------------------------------------------------------------+ | logical_plan after datafusion | Projection: a.t1v1, a.t1v2, b.t1v1, b.t1v2 | | | CrossJoin: | | | SubqueryAlias: a | | | TableScan: t1 | | | SubqueryAlias: b | | | TableScan: t1 | | logical_plan after optd | LogicalProjection { exprs: [ #0, #1, #2, #3 ] } | | | └── LogicalJoin { join_type: Cross, cond: true } | | | ├── LogicalScan { table: t1 } | | | └── LogicalScan { table: t1 } | | physical_plan after optd | PhysicalProjection { exprs: [ #0, #1, #2, #3 ] } | | | └── PhysicalNestedLoopJoin { join_type: Cross, cond: true } | | | ├── PhysicalScan { table: t1 } | | | └── PhysicalScan { table: t1 } | | physical_plan after optd-join-order | (NLJ t1 t1) | | physical_plan after optd-all-join-orders | SAME TEXT AS ABOVE | | physical_plan after optd-all-logical-join-orders | (Join t1 t1) | | physical_plan | CollectorExec group_id=!17 | | | ProjectionExec: expr=[<expr>@0 as col0, <expr>@1 as col1, <expr>@2 as col2, <expr>@3 as col3] | | | CollectorExec group_id=!5 | | | CrossJoinExec | | | CollectorExec group_id=!1 | | | MemoryExec: partitions=1, partition_sizes=[0] | | | CollectorExec group_id=!1 | | | MemoryExec: partitions=1, partition_sizes=[0] | | | | +--------------------------------------------------+-------------------------------------------------------------------------------------------------+ ``` --------- Signed-off-by: Yuchen Liang <[email protected]>
Gun9niR
added a commit
that referenced
this pull request
Mar 20, 2024
Generate the statistics in perftest and put them into `BaseCostModel` in `DatafusionOptimizer`. Below is the comparison before & after stats are added. You can check `PhysicalScan`, where the cost has changed. The final cardinality remains the same because when stats on a column is missing, we use a very small magic number `INVALID_SELECTIVITY` (0.001) that just sets cardinality to 1. ### Todos in Future PRs - Support generating stats on `Utf8`. - Set a better magic number. - Generate MCV. ### Before ``` plan space size budget used, not applying logical rules any more. current plan space: 1094 explain: PhysicalSort ├── exprs:SortOrder { order: Desc } │ └── #1 ├── cost: weighted=185.17,row_cnt=1.00,compute=179.17,io=6.00 └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=182.12,row_cnt=1.00,compute=176.12,io=6.00 } └── PhysicalAgg ├── aggrs:Agg(Sum) │ └── Mul │ ├── #0 │ └── Sub │ ├── 1 │ └── #1 ├── groups: [ #2 ] ├── cost: weighted=182.02,row_cnt=1.00,compute=176.02,io=6.00 └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=64.90,row_cnt=1.00,compute=58.90,io=6.00 } └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=64.76,row_cnt=1.00,compute=58.76,io=6.00 } └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=64.54,row_cnt=1.00,compute=58.54,io=6.00 } └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=64.24,row_cnt=1.00,compute=58.24,io=6.00 } └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=63.82,row_cnt=1.00,compute=57.82,io=6.00 } └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=63.32,row_cnt=1.00,compute=57.32,io=6.00 } └── PhysicalNestedLoopJoin ├── join_type: Inner ├── cond:And │ ├── Eq │ │ ├── #11 │ │ └── #14 │ └── Eq │ ├── #3 │ └── #15 ├── cost: weighted=62.74,row_cnt=1.00,compute=56.74,io=6.00 ├── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #1 ], cost: weighted=35.70,row_cnt=1.00,compute=32.70,io=3.00 } │ ├── PhysicalScan { table: customer, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=31.64,row_cnt=1.00,compute=29.64,io=2.00 } │ ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=27.40,row_cnt=1.00,compute=26.40,io=1.00 } │ │ └── PhysicalFilter │ │ ├── cond:And │ │ │ ├── Geq │ │ │ │ ├── #2 │ │ │ │ └── 9131 │ │ │ └── Lt │ │ │ ├── #2 │ │ │ └── 9496 │ │ ├── cost: weighted=27.30,row_cnt=1.00,compute=26.30,io=1.00 │ │ └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 } │ │ └── PhysicalScan { table: orders, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } │ └── PhysicalProjection { exprs: [ #0, #2, #5, #6 ], cost: weighted=1.18,row_cnt=1.00,compute=0.18,io=1.00 } │ └── PhysicalScan { table: lineitem, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=15.72,row_cnt=1.00,compute=12.72,io=3.00 } └── PhysicalHashJoin { join_type: Inner, left_keys: [ #3 ], right_keys: [ #0 ], cost: weighted=15.46,row_cnt=1.00,compute=12.46,io=3.00 } ├── PhysicalScan { table: supplier, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } └── PhysicalHashJoin { join_type: Inner, left_keys: [ #2 ], right_keys: [ #0 ], cost: weighted=11.40,row_cnt=1.00,compute=9.40,io=2.00 } ├── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 } │ └── PhysicalScan { table: nation, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } └── PhysicalProjection { exprs: [ #0 ], cost: weighted=7.20,row_cnt=1.00,compute=6.20,io=1.00 } └── PhysicalFilter ├── cond:Eq │ ├── #1 │ └── "AMERICA" ├── cost: weighted=7.14,row_cnt=1.00,compute=6.14,io=1.00 └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=1.10,row_cnt=1.00,compute=0.10,io=1.00 } └── PhysicalScan { table: region, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 } plan space size budget used, not applying logical rules any more. current plan space: 1094 qerrors: {"DataFusion": [5.0]} ``` ### After ``` plan space size budget used, not applying logical rules any more. current plan space: 1094 explain: PhysicalSort ├── exprs:SortOrder { order: Desc } │ └── #1 ├── cost: weighted=336032.32,row_cnt=1.00,compute=259227.32,io=76805.00 └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=336029.27,row_cnt=1.00,compute=259224.27,io=76805.00 } └── PhysicalAgg ├── aggrs:Agg(Sum) │ └── Mul │ ├── #0 │ └── Sub │ ├── 1 │ └── #1 ├── groups: [ #2 ] ├── cost: weighted=336029.17,row_cnt=1.00,compute=259224.17,io=76805.00 └── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=335912.05,row_cnt=1.00,compute=259107.05,io=76805.00 } └── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=335911.91,row_cnt=1.00,compute=259106.91,io=76805.00 } └── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=335911.69,row_cnt=1.00,compute=259106.69,io=76805.00 } └── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=335911.39,row_cnt=1.00,compute=259106.39,io=76805.00 } └── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=335910.97,row_cnt=1.00,compute=259105.97,io=76805.00 } └── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=335910.47,row_cnt=1.00,compute=259105.47,io=76805.00 } └── PhysicalNestedLoopJoin ├── join_type: Inner ├── cond:And │ ├── Eq │ │ ├── #11 │ │ └── #14 │ └── Eq │ ├── #3 │ └── #15 ├── cost: weighted=335909.89,row_cnt=1.00,compute=259104.89,io=76805.00 ├── PhysicalProjection { exprs: [ #6, #7, #8, #9, #10, #11, #12, #13, #0, #1, #2, #3, #4, #5 ], cost: weighted=335619.21,row_cnt=1.00,compute=258944.21,io=76675.00 } │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #0 ], cost: weighted=335618.63,row_cnt=1.00,compute=258943.63,io=76675.00 } │ ├── PhysicalProjection { exprs: [ #4, #5, #0, #1, #2, #3 ], cost: weighted=332616.57,row_cnt=1.00,compute=257441.57,io=75175.00 } │ │ └── PhysicalProjection { exprs: [ #0, #2, #5, #6, #16, #17 ], cost: weighted=332616.31,row_cnt=1.00,compute=257441.31,io=75175.00 } │ │ └── PhysicalProjection { exprs: [ #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #0, #1 ], cost: weighted=332616.05,row_cnt=1.00,compute=257441.05,io=75175.00 } │ │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=332615.31,row_cnt=1.00,compute=257440.31,io=75175.00 } │ │ ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=212263.25,row_cnt=1.00,compute=197263.25,io=15000.00 } │ │ │ └── PhysicalFilter │ │ │ ├── cond:And │ │ │ │ ├── Geq │ │ │ │ │ ├── #2 │ │ │ │ │ └── 9131 │ │ │ │ └── Lt │ │ │ │ ├── #2 │ │ │ │ └── 9496 │ │ │ ├── cost: weighted=212263.15,row_cnt=1.00,compute=197263.15,io=15000.00 │ │ │ └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=16050.07,row_cnt=15000.00,compute=1050.07,io=15000.00 } │ │ │ └── PhysicalScan { table: orders, cost: weighted=15000.00,row_cnt=15000.00,compute=0.00,io=15000.00 } │ │ └── PhysicalScan { table: lineitem, cost: weighted=60175.00,row_cnt=60175.00,compute=0.00,io=60175.00 } │ └── PhysicalScan { table: customer, cost: weighted=1500.00,row_cnt=1500.00,compute=0.00,io=1500.00 } └── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=279.36,row_cnt=1.00,compute=149.36,io=130.00 } └── PhysicalProjection { exprs: [ #4, #5, #6, #7, #8, #9, #10, #0, #1, #2, #3 ], cost: weighted=279.10,row_cnt=1.00,compute=149.10,io=130.00 } └── PhysicalProjection { exprs: [ #1, #2, #3, #0, #4, #5, #6, #7, #8, #9, #10 ], cost: weighted=278.64,row_cnt=1.00,compute=148.64,io=130.00 } └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #3 ], cost: weighted=278.18,row_cnt=1.00,compute=148.18,io=130.00 } ├── PhysicalProjection { exprs: [ #3, #0, #1, #2 ], cost: weighted=76.12,row_cnt=1.00,compute=46.12,io=30.00 } │ └── PhysicalProjection { exprs: [ #0, #1, #2, #4 ], cost: weighted=75.94,row_cnt=1.00,compute=45.94,io=30.00 } │ └── PhysicalProjection { exprs: [ #1, #2, #3, #4, #0 ], cost: weighted=75.76,row_cnt=1.00,compute=45.76,io=30.00 } │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #2 ], cost: weighted=75.54,row_cnt=1.00,compute=45.54,io=30.00 } │ ├── PhysicalProjection { exprs: [ #0 ], cost: weighted=23.48,row_cnt=1.00,compute=18.48,io=5.00 } │ │ └── PhysicalFilter │ │ ├── cond:Eq │ │ │ ├── #1 │ │ │ └── "AMERICA" │ │ ├── cost: weighted=23.42,row_cnt=1.00,compute=18.42,io=5.00 │ │ └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=5.30,row_cnt=5.00,compute=0.30,io=5.00 } │ │ └── PhysicalScan { table: region, cost: weighted=5.00,row_cnt=5.00,compute=0.00,io=5.00 } │ └── PhysicalScan { table: nation, cost: weighted=25.00,row_cnt=25.00,compute=0.00,io=25.00 } └── PhysicalScan { table: supplier, cost: weighted=100.00,row_cnt=100.00,compute=0.00,io=100.00 } plan space size budget used, not applying logical rules any more. current plan space: 1094 qerrors: {"DataFusion": [5.0]} ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement logical filter plan node.