Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add projection to HashJoinExec. #9236

Merged

Conversation

my-vegetable-has-exploded
Copy link
Contributor

@my-vegetable-has-exploded my-vegetable-has-exploded commented Feb 15, 2024

Which issue does this PR close?

ref #6768

Rationale for this change

Some projection can't be pushed down left input or right input of hash join because filter or on need may need some columns that won't be used in later.
By embed those projection to hash join, we can reduce the cost of build_batch_from_indices in hash join (build_batch_from_indices need to can compute::take() for each column) and avoid unecessary output creation.

What changes are included in this PR?

Add a rule try_embed_to_hash_join in physical_optimizer/projection_pushdown.rs. More related changes are are noted in the comments.

Are these changes tested?

Are there any user-facing changes?

None

@github-actions github-actions bot added the core Core DataFusion crate label Feb 15, 2024
@my-vegetable-has-exploded my-vegetable-has-exploded force-pushed the hashjoin-project-pushdown branch 2 times, most recently from 54f4712 to a25e272 Compare February 15, 2024 10:53
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 16, 2024
@my-vegetable-has-exploded
Copy link
Contributor Author

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 466.19ms │                  467.21ms │     no change │
│ QQuery 2     │  83.28ms │                   78.72ms │ +1.06x faster │
│ QQuery 3     │ 194.39ms │                  189.47ms │     no change │
│ QQuery 4     │ 136.94ms │                  132.81ms │     no change │
│ QQuery 5     │ 342.11ms │                  337.57ms │     no change │
│ QQuery 6     │ 124.39ms │                  122.95ms │     no change │
│ QQuery 7     │ 489.19ms │                  475.26ms │     no change │
│ QQuery 8     │ 307.07ms │                  286.77ms │ +1.07x faster │
│ QQuery 9     │ 455.38ms │                  456.27ms │     no change │
│ QQuery 10    │ 354.06ms │                  370.70ms │     no change │
│ QQuery 11    │  90.61ms │                   84.50ms │ +1.07x faster │
│ QQuery 12    │ 168.44ms │                  164.79ms │     no change │
│ QQuery 13    │ 259.32ms │                  251.40ms │     no change │
│ QQuery 14    │ 166.62ms │                  162.89ms │     no change │
│ QQuery 15    │ 232.14ms │                  226.26ms │     no change │
│ QQuery 16    │  77.71ms │                   77.55ms │     no change │
│ QQuery 17    │ 497.30ms │                  497.10ms │     no change │
│ QQuery 18    │ 769.87ms │                  748.68ms │     no change │
│ QQuery 19    │ 267.55ms │                  274.82ms │     no change │
│ QQuery 20    │ 254.93ms │                  256.73ms │     no change │
│ QQuery 21    │ 552.90ms │                  547.46ms │     no change │
│ QQuery 22    │  75.80ms │                   74.67ms │     no change │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 6366.18ms │
│ Total Time (hashjoin-project-pushdown)   │ 6284.59ms │
│ Average Time (main)                      │  289.37ms │
│ Average Time (hashjoin-project-pushdown) │  285.66ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        19 │
└──────────────────────────────────────────┴───────────┘

The result in my pc is unstable, sometimes it get slower😅.
This is last result that I get.

@@ -57,12 +57,11 @@ Limit: skip=0, fetch=5
physical_plan
GlobalLimitExec: skip=0, fetch=5
--SortPreservingMergeExec: [a@0 ASC NULLS LAST], fetch=5
----ProjectionExec: expr=[a@1 as a]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typical exmaple here

@Dandandan
Copy link
Contributor

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 466.19ms │                  467.21ms │     no change │
│ QQuery 2     │  83.28ms │                   78.72ms │ +1.06x faster │
│ QQuery 3     │ 194.39ms │                  189.47ms │     no change │
│ QQuery 4     │ 136.94ms │                  132.81ms │     no change │
│ QQuery 5     │ 342.11ms │                  337.57ms │     no change │
│ QQuery 6     │ 124.39ms │                  122.95ms │     no change │
│ QQuery 7     │ 489.19ms │                  475.26ms │     no change │
│ QQuery 8     │ 307.07ms │                  286.77ms │ +1.07x faster │
│ QQuery 9     │ 455.38ms │                  456.27ms │     no change │
│ QQuery 10    │ 354.06ms │                  370.70ms │     no change │
│ QQuery 11    │  90.61ms │                   84.50ms │ +1.07x faster │
│ QQuery 12    │ 168.44ms │                  164.79ms │     no change │
│ QQuery 13    │ 259.32ms │                  251.40ms │     no change │
│ QQuery 14    │ 166.62ms │                  162.89ms │     no change │
│ QQuery 15    │ 232.14ms │                  226.26ms │     no change │
│ QQuery 16    │  77.71ms │                   77.55ms │     no change │
│ QQuery 17    │ 497.30ms │                  497.10ms │     no change │
│ QQuery 18    │ 769.87ms │                  748.68ms │     no change │
│ QQuery 19    │ 267.55ms │                  274.82ms │     no change │
│ QQuery 20    │ 254.93ms │                  256.73ms │     no change │
│ QQuery 21    │ 552.90ms │                  547.46ms │     no change │
│ QQuery 22    │  75.80ms │                   74.67ms │     no change │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 6366.18ms │
│ Total Time (hashjoin-project-pushdown)   │ 6284.59ms │
│ Average Time (main)                      │  289.37ms │
│ Average Time (hashjoin-project-pushdown) │  285.66ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        19 │
└──────────────────────────────────────────┴───────────┘

The result in my pc is unstable, sometimes it get slower😅. This is last result that I get.

Nice, could you run/post the tcph_mem results?

@Dandandan
Copy link
Contributor

These are my results for tcph_mem, seems to be a small but reasonable speed up 🚀 :

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  84.89ms │                    79.70ms │ +1.07x faster │
│ QQuery 2     │  19.46ms │                    19.53ms │     no change │
│ QQuery 3     │  30.86ms │                    30.46ms │     no change │
│ QQuery 4     │  27.40ms │                    27.76ms │     no change │
│ QQuery 5     │  46.82ms │                    45.26ms │     no change │
│ QQuery 6     │   6.24ms │                     6.46ms │     no change │
│ QQuery 7     │ 107.98ms │                   101.54ms │ +1.06x faster │
│ QQuery 8     │  36.99ms │                    34.92ms │ +1.06x faster │
│ QQuery 9     │  51.31ms │                    50.74ms │     no change │
│ QQuery 10    │  61.42ms │                    58.83ms │     no change │
│ QQuery 11    │  14.90ms │                    14.45ms │     no change │
│ QQuery 12    │  30.32ms │                    30.39ms │     no change │
│ QQuery 13    │  30.31ms │                    30.53ms │     no change │
│ QQuery 14    │   8.85ms │                     8.84ms │     no change │
│ QQuery 15    │  23.29ms │                    22.15ms │     no change │
│ QQuery 16    │  19.47ms │                    20.17ms │     no change │
│ QQuery 17    │  50.95ms │                    50.01ms │     no change │
│ QQuery 18    │ 130.56ms │                   128.13ms │     no change │
│ QQuery 19    │  27.94ms │                    27.38ms │     no change │
│ QQuery 20    │  42.96ms │                    39.38ms │ +1.09x faster │
│ QQuery 21    │ 121.82ms │                   114.06ms │ +1.07x faster │
│ QQuery 22    │  13.53ms │                    13.47ms │     no change │
└──────────────┴──────────┴────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                         │ 988.29ms │
│ Total Time (hashjoin-project-pushdown)   │ 954.15ms │
│ Average Time (main)                       │  44.92ms │
│ Average Time (hashjoin-project-pushdown) │  43.37ms │
│ Queries Faster                            │        5 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       17 │
└───────────────────────────────────────────┴──────────┘

@my-vegetable-has-exploded
Copy link
Contributor Author

These are my results for tcph_mem, seems to be a small but reasonable speed up 🚀 :

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  84.89ms │                    79.70ms │ +1.07x faster │
│ QQuery 2     │  19.46ms │                    19.53ms │     no change │
│ QQuery 3     │  30.86ms │                    30.46ms │     no change │
│ QQuery 4     │  27.40ms │                    27.76ms │     no change │
│ QQuery 5     │  46.82ms │                    45.26ms │     no change │
│ QQuery 6     │   6.24ms │                     6.46ms │     no change │
│ QQuery 7     │ 107.98ms │                   101.54ms │ +1.06x faster │
│ QQuery 8     │  36.99ms │                    34.92ms │ +1.06x faster │
│ QQuery 9     │  51.31ms │                    50.74ms │     no change │
│ QQuery 10    │  61.42ms │                    58.83ms │     no change │
│ QQuery 11    │  14.90ms │                    14.45ms │     no change │
│ QQuery 12    │  30.32ms │                    30.39ms │     no change │
│ QQuery 13    │  30.31ms │                    30.53ms │     no change │
│ QQuery 14    │   8.85ms │                     8.84ms │     no change │
│ QQuery 15    │  23.29ms │                    22.15ms │     no change │
│ QQuery 16    │  19.47ms │                    20.17ms │     no change │
│ QQuery 17    │  50.95ms │                    50.01ms │     no change │
│ QQuery 18    │ 130.56ms │                   128.13ms │     no change │
│ QQuery 19    │  27.94ms │                    27.38ms │     no change │
│ QQuery 20    │  42.96ms │                    39.38ms │ +1.09x faster │
│ QQuery 21    │ 121.82ms │                   114.06ms │ +1.07x faster │
│ QQuery 22    │  13.53ms │                    13.47ms │     no change │
└──────────────┴──────────┴────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                         │ 988.29ms │
│ Total Time (hashjoin-project-pushdown)   │ 954.15ms │
│ Average Time (main)                       │  44.92ms │
│ Average Time (hashjoin-project-pushdown) │  43.37ms │
│ Queries Faster                            │        5 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       17 │
└───────────────────────────────────────────┴──────────┘

Thanks, @Dandandan. Currently, I don't project equivalence_properties and output_ordering. So some optimizer don't work after embed projection to HashJoinExec. I am trying to handle it.

@my-vegetable-has-exploded
Copy link
Contributor Author

These are my results for tcph_mem, seems to be a small but reasonable speed up 🚀 :

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  84.89ms │                    79.70ms │ +1.07x faster │
│ QQuery 2     │  19.46ms │                    19.53ms │     no change │
│ QQuery 3     │  30.86ms │                    30.46ms │     no change │
│ QQuery 4     │  27.40ms │                    27.76ms │     no change │
│ QQuery 5     │  46.82ms │                    45.26ms │     no change │
│ QQuery 6     │   6.24ms │                     6.46ms │     no change │
│ QQuery 7     │ 107.98ms │                   101.54ms │ +1.06x faster │
│ QQuery 8     │  36.99ms │                    34.92ms │ +1.06x faster │
│ QQuery 9     │  51.31ms │                    50.74ms │     no change │
│ QQuery 10    │  61.42ms │                    58.83ms │     no change │
│ QQuery 11    │  14.90ms │                    14.45ms │     no change │
│ QQuery 12    │  30.32ms │                    30.39ms │     no change │
│ QQuery 13    │  30.31ms │                    30.53ms │     no change │
│ QQuery 14    │   8.85ms │                     8.84ms │     no change │
│ QQuery 15    │  23.29ms │                    22.15ms │     no change │
│ QQuery 16    │  19.47ms │                    20.17ms │     no change │
│ QQuery 17    │  50.95ms │                    50.01ms │     no change │
│ QQuery 18    │ 130.56ms │                   128.13ms │     no change │
│ QQuery 19    │  27.94ms │                    27.38ms │     no change │
│ QQuery 20    │  42.96ms │                    39.38ms │ +1.09x faster │
│ QQuery 21    │ 121.82ms │                   114.06ms │ +1.07x faster │
│ QQuery 22    │  13.53ms │                    13.47ms │     no change │
└──────────────┴──────────┴────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                         │ 988.29ms │
│ Total Time (hashjoin-project-pushdown)   │ 954.15ms │
│ Average Time (main)                       │  44.92ms │
│ Average Time (hashjoin-project-pushdown) │  43.37ms │
│ Queries Faster                            │        5 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       17 │
└───────────────────────────────────────────┴──────────┘

Thanks, @Dandandan. Currently, I don't project equivalence_properties and output_ordering. So some optimizer don't work after embed projection to HashJoinExec. I am trying to handle it.

Done! I will add more docs tomorrow.

@ozankabak
Copy link
Contributor

@metesynnada PTAL

@@ -217,6 +217,8 @@ fn roundtrip_hash_join() -> Result<()> {
on.clone(),
None,
join_type,
// TODO: add a projectionexec for projection in the join
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore this comment, I will remove it later.

@@ -274,5 +274,5 @@ query PI
SELECT DATE_TRUNC('minute', to_timestamp_seconds("EventTime")) AS M, COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-14' AND "EventDate"::INT::DATE <= '2013-07-15' AND "IsRefresh" = 0 AND "DontCountHits" = 0 GROUP BY DATE_TRUNC('minute', to_timestamp_seconds("EventTime")) ORDER BY DATE_TRUNC('minute', M) LIMIT 10 OFFSET 1000;
----

query
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changed by cargo test --test sqllogictests -- --complete automatically.

@@ -645,14 +751,23 @@ impl ExecutionPlan for HashJoinExec {
// over the right that uses this information to issue new batches.
let right_stream = self.right.execute(partition, context)?;

// update column indices to reflect the projection
let column_indices_after_projection = match &self.projection {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project column_indices, so build_batch_from_indices can skip useless column.

@@ -282,7 +284,7 @@ pub struct HashJoinExec {
pub filter: Option<JoinFilter>,
/// How the join is performed (`OUTER`, `INNER`, etc)
pub join_type: JoinType,
/// The output schema for the join
/// The schema after join
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add projection
  • update equivalence_properties and output_ordering after projection
  • update column_indices
  • keep HashJoinExec.schema as the result of build_join_schema , we can get the finally schema after projection through schema() function. So we need to be careful when using it.

@korowa
Copy link
Contributor

korowa commented Mar 6, 2024

Same here -- planning to take a closer look during tomorrow, the idea in general looks good though.

Thank you @my-vegetable-has-exploded

@@ -97,6 +97,8 @@ impl PhysicalOptimizer {
// Note that one should always run this rule after running the EnforceDistribution rule
// as the latter may break local sorting requirements.
Arc::new(EnforceSorting::new()),
// TODO: `try_embed_to_hash_join` in the ProjectionPushdown rule would be block by the CoalesceBatches, so add it before CoalesceBatches. Maybe optimize it in the future.
Arc::new(ProjectionPushdown::new()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have two ProjectionPushdown? The original can be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the original will be modified to account for the new built-in projection capability and this one will be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original will be modified to account for th

Is this refers to #9111?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall. I only have a few comments.

@my-vegetable-has-exploded
Copy link
Contributor Author

Thank you all for review. @Dandandan @metesynnada @korowa @viirya
I think this PR is ready to go. @alamb

@Dandandan Dandandan merged commit afddb32 into apache:main Mar 10, 2024
23 checks passed
@Dandandan
Copy link
Contributor

Thank you @my-vegetable-has-exploded !

@my-vegetable-has-exploded my-vegetable-has-exploded deleted the hashjoin-project-pushdown branch March 10, 2024 08:11
RaphaelMarinier added a commit to RaphaelMarinier/datafusion-ballista that referenced this pull request Jun 27, 2024
Since datafusion's apache/datafusion#9236,
HashJoinExec can also project.
RaphaelMarinier added a commit to RaphaelMarinier/datafusion-ballista that referenced this pull request Jun 27, 2024
Since datafusion's apache/datafusion#9236,
HashJoinExec can also project.
RaphaelMarinier added a commit to RaphaelMarinier/datafusion-ballista that referenced this pull request Jun 27, 2024
Since datafusion's apache/datafusion#9236,
HashJoinExec can also project.
RaphaelMarinier added a commit to RaphaelMarinier/datafusion-ballista that referenced this pull request Jun 27, 2024
Since datafusion's apache/datafusion#9236,
HashJoinExec can also project.
andygrove pushed a commit to apache/datafusion-ballista that referenced this pull request Jun 27, 2024
* Upgrading Ballista to datafusion 37.0.0.

* Better test debugging information in planner.rs

* Updated test logic in planner.

Since datafusion's apache/datafusion#9236,
HashJoinExec can also project.

* cargo fmt

* cargo fix

* Removed leftover comment

* Make cargo clippy happy

* lint

* Cargo fmt

* Fix tpch build

* Fix comment spelling

* cargo fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants