-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using FineGrainedShuffle Info for Join&Agg #6279
Support using FineGrainedShuffle Info for Join&Agg #6279
Conversation
Signed-off-by: yibin <[email protected]>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
…d_grained_agg Signed-off-by: yibin <[email protected]>
…sh into phase3_join_agg_fine_grain
Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>
/run-unit-tests |
/run-integration-test |
Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>
/run-all-tests |
Signed-off-by: yibin <[email protected]>
@@ -429,7 +429,8 @@ ExchangeReceiverBase<RPCContext>::ExchangeReceiverBase( | |||
{ | |||
if (enableFineGrainedShuffle(fine_grained_shuffle_stream_count_)) | |||
{ | |||
for (size_t i = 0; i < max_streams_; ++i) | |||
size_t channel_count = std::min(max_streams_, fine_grained_shuffle_stream_count_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_streams_
in ExchangeReceiver
is actually useless, I think we can replace max_streams
and fine_grained_shuffle_stream_count
by output_stream_count
and enable_fine_grained_shuffle
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, updated
{ | ||
auto receiver_source_num_info = getReceiverSourceNumInfo(*build_pipeline.firstStream()); | ||
RUNTIME_CHECK(receiver_source_num_info.first == 1); | ||
shuffle_partition_num = receiver_source_num_info.second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shuffle_partition_num
is set to source_num
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a wrong design, fixed it.
…e reproducing by hash and fine_grained_shuffle_stream_count Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>
@@ -41,44 +41,94 @@ std::vector<MutableColumns> createDestColumns(const Block & sample_block, size_t | |||
return dest_tbl_cols; | |||
} | |||
|
|||
void fillSelector(size_t rows, | |||
const WeakHash32 & hash, | |||
uint32_t num_bucket, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to rename num_bucket
to num_part
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (enable_fine_grained_shuffle && rows > 0) | ||
{ | ||
/// TODO: consider adding a virtual column in Sender side to avoid computing cost and potential inconsistency by heterogeneous envs | ||
/// Note: 1. Not sure, if inconsistency will do happen in heterogeneous envs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there maybe problems in heterogeneous envs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to make it clear.
Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>
/run-all-tests |
Signed-off-by: yibin <[email protected]>
Signed-off-by: yibin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM
Signed-off-by: yibin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
@yibin87: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 3c29db0
|
Signed-off-by: yibin [email protected]
What problem does this PR solve?
Issue Number: ref #6157
Problem Summary:
What is changed and how it works?
FineGrainedShuffle info will be added in tidb side if the plan satisfies certain rules. In TiFlash side, both Join and Agg operators can benefit from these info:
A little refact of HashBaseWriteHelper to make parameters sequence more reasonable and other little changes.
Most changed logic is protected by enable_fine_grained_shuffle flag which is set false now in TiDB. Will add another separate PR to add gtests for FinedGrainedShuffleJoin&Agg then.
Check List
Tests
Side effects
Documentation
Release note