Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: produce metadata in optimizer output #83

Merged
merged 11 commits into from
Mar 11, 2024
Merged

Conversation

yliang412
Copy link
Member

@yliang412 yliang412 commented Feb 20, 2024

Closes #65. This PR produces metadata in optimizer output. Currently, the only metadata kept is the group id. The group id information has the following benefits:

  • allow the consumer of the optd plan to recover reusable fragments in the DAG-shaped query plan.
  • implement adaptive physical collector outside optd-core.
  • display cardinality cost: Display cardinality and cost in explain #89

Looking forward to feedbacks!

In the example below, we produce the same plan as before when adaptiveness is enabled.

❯ create table t1(t1v1 int, t1v2 int);
0 rows in set. Query took 0.009 seconds.

Execution took 0.000 secs, Planning took 0.000 secs
❯ explain select * from t1 as a, t1 as b;
+--------------------------------------------------+-------------------------------------------------------------------------------------------------+
| plan_type                                        | plan                                                                                            |
+--------------------------------------------------+-------------------------------------------------------------------------------------------------+
| logical_plan after datafusion                    | Projection: a.t1v1, a.t1v2, b.t1v1, b.t1v2                                                      |
|                                                  |   CrossJoin:                                                                                    |
|                                                  |     SubqueryAlias: a                                                                            |
|                                                  |       TableScan: t1                                                                             |
|                                                  |     SubqueryAlias: b                                                                            |
|                                                  |       TableScan: t1                                                                             |
| logical_plan after optd                          | LogicalProjection { exprs: [ #0, #1, #2, #3 ] }                                                 |
|                                                  | └── LogicalJoin { join_type: Cross, cond: true }                                                |
|                                                  |     ├── LogicalScan { table: t1 }                                                               |
|                                                  |     └── LogicalScan { table: t1 }                                                               |
| physical_plan after optd                         | PhysicalProjection { exprs: [ #0, #1, #2, #3 ] }                                                |
|                                                  | └── PhysicalNestedLoopJoin { join_type: Cross, cond: true }                                     |
|                                                  |     ├── PhysicalScan { table: t1 }                                                              |
|                                                  |     └── PhysicalScan { table: t1 }                                                              |
| physical_plan after optd-join-order              | (NLJ t1 t1)                                                                                     |
| physical_plan after optd-all-join-orders         | SAME TEXT AS ABOVE                                                                              |
| physical_plan after optd-all-logical-join-orders | (Join t1 t1)                                                                                    |
| physical_plan                                    | CollectorExec group_id=!17                                                                      |
|                                                  |   ProjectionExec: expr=[<expr>@0 as col0, <expr>@1 as col1, <expr>@2 as col2, <expr>@3 as col3] |
|                                                  |     CollectorExec group_id=!5                                                                   |
|                                                  |       CrossJoinExec                                                                             |
|                                                  |         CollectorExec group_id=!1                                                               |
|                                                  |           MemoryExec: partitions=1, partition_sizes=[0]                                         |
|                                                  |         CollectorExec group_id=!1                                                               |
|                                                  |           MemoryExec: partitions=1, partition_sizes=[0]                                         |
|                                                  |                                                                                                 |
+--------------------------------------------------+-------------------------------------------------------------------------------------------------+

}

/// Get the group binding.
pub fn step_get_optimize_rel(&self, group_id: GroupId) -> Result<RelNodeRef<T>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? Maintaining two versions of step_get_optimizer_rel_[with_meta] seems error-prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea maybe I will add an Option<Meta> and keep one copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this meta be none though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I think I was mixing this one up with memo.get_best_group_binding().

Copy link
Contributor

@Gun9niR Gun9niR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is great! IIUC it also enables displaying logical properties or cost in explain stmt.

Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
@yliang412 yliang412 changed the title (WIP) produce metadata in optimizer output feat: produce metadata in optimizer output Mar 11, 2024
@yliang412 yliang412 requested a review from Gun9niR March 11, 2024 17:44
Copy link
Contributor

@Gun9niR Gun9niR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yliang412 yliang412 merged commit c4e778d into main Mar 11, 2024
1 check passed
@yliang412 yliang412 deleted the yliang/dag-metadata branch March 11, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Produce metadata along with RelNode
2 participants