-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support the new compact RPC request #4885
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Signed-off-by: Wish <[email protected]>
Signed-off-by: Wish <[email protected]>
Signed-off-by: Wish <[email protected]>
Signed-off-by: Wish <[email protected]>
Signed-off-by: Wish <[email protected]>
The kvproto change has been merged into the PR. |
auto dm_context = newDMContext(context, context.getSettingsRef(), /*tracing_id*/ "mergeDeltaBySegment"); | ||
if (start_key.is_common_handle != dm_context->is_common_handle) | ||
{ | ||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply throw an exception. It is unexpected status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start_key
comes from the RPC request. From the TiFlash perspective maybe better to think all RPC callers to be a bad guy in order to isolate component errors. Otherwise a mistake in TiDB may cause TiFlash to crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or may be better to leave this check in the ManualCompactManager? We could keep a property that all calls to DeltaMergeStore
should be checked before calling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it is better to rise some serious thing to notify the unexpected status
, as it implies bugs usually. How about returning errors to TiDB when there are invalid params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW throw exception here should not crash TiFlash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved this check to the TiFlash service side. Please take a look again, thanks!
Also added a test in the DM to ensure that, when common handle is given for a int handle table, there will be an exception thrown.
@@ -1027,6 +1025,43 @@ void DeltaMergeStore::mergeDeltaAll(const Context & context) | |||
} | |||
} | |||
|
|||
std::optional<DM::RowKeyRange> DeltaMergeStore::mergeDeltaBySegment(const Context & context, const RowKeyValue & start_key) | |||
{ | |||
auto dm_context = newDMContext(context, context.getSettingsRef(), /*tracing_id*/ "mergeDeltaBySegment"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use updateGCSafePoint
before creating the DMContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add the value of latest_gc_safepoint to the "tracing_id"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Just curious, if we don't update the safepoint here, what will happen in worst case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW updating GC safepoint here means we will send 1 PD request for every segment. Is this fine? (Do we need something like throttle?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution is to let updateGCSafePoint
to be public, so that it will only get called once per compact request. I'm not sure whether this might leak the encapsulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, if we don't update the safepoint here, what will happen in worst case?
When the origin segment doesn't have any data in delta, and the stable contains rows that are older than gc safepoint. Even after we apply "merge delta" on the segment, we take some resources to rewrite the stable data (but it is effectively the same as before), but we don't get any performance improvement.
M(SettingUInt64, manual_compact_max_concurrency, 10, "Max concurrent tasks. It should be larger than pool size.") \ | ||
M(SettingUInt64, manual_compact_more_until_ms, 60000, "Continuously compact more segments until reaching specified elapsed time. If 0 is specified, only one segment will be compacted each round.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems these two configs are not shown in the "config-template.toml" file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are advanced config that normal users should not modify.
/run-all-tests |
1 similar comment
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
/merge |
@breezewish: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 2d907ec
|
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
/merge |
@breezewish: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
/run-integration-test |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
@breezewish: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
What problem does this PR solve?
Issue Number: ref #4897
Problem Summary:
This PR adds the implementation of the Compact RPC. The Compact RPC is defined as:
Compact one or more data partials (Segments in DeltaTree) according to the specified Table ID, and returns the key for compacting more data partials.
See detailed RPC interface definition in tiflash: Add TiFlash compaction RPC and messages kvproto#918
The TiDB PR to utilize this RPC endpoint: pingcap/tidb#34741
What is changed and how it works?
New:
The impl of Compact RPC:
Flash/Management/ManualCompact.cpp|h
.Public API to compact one segment in DeltaTree:
Storages/DeltaMerge/*
: Mainlystd::optional<DM::RowKeyRange> mergeDeltaBySegment(const Context & context, const DM::RowKeyValue & start_key);
Unit tests about the compact RPC and compact one segment: See
tests
.Change:
tiflashErrorCodeToGrpcStatusCode
from theCoprocessorHandler
to a new shared utility calledServiceUtils
, so that Compact RPC can use it as well.Check List
Tests
Side effects
Documentation
Release note
None, as the user interface is in the TiDB side.