Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: individually check the scheduling halt for online unsafe recovery #8147

Merged
merged 3 commits into from
May 8, 2024

Conversation

JmPotato
Copy link
Member

@JmPotato JmPotato commented May 7, 2024

What problem does this PR solve?

Issue Number: close #8095, ref #6493.

What is changed and how does it work?

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Check List

Tests

  • Unit test
  • Integration test

Release note

Fix the issue where the cluster cannot recover normally after using the online unsafe recovery.

@JmPotato JmPotato added the component/schedule Scheduling logic. label May 7, 2024
Copy link
Contributor

ti-chi-bot bot commented May 7, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • HuSharp
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 7, 2024
Copy link

codecov bot commented May 7, 2024

Codecov Report

Attention: Patch coverage is 45.45455% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 77.38%. Comparing base (44d57b6) to head (4d44354).

❗ Current head 4d44354 differs from pull request most recent head 9a4d9c8. Consider uploading reports for the commit 9a4d9c8 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8147      +/-   ##
==========================================
- Coverage   77.44%   77.38%   -0.07%     
==========================================
  Files         471      471              
  Lines       61348    61347       -1     
==========================================
- Hits        47510    47472      -38     
- Misses      10278    10310      +32     
- Partials     3560     3565       +5     
Flag Coverage Δ
unittests 77.38% <45.45%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@ -1001,6 +998,14 @@ func (o *PersistOptions) SetHaltScheduling(halt bool, source string) {
}
}

// SetHaltScheduling set HaltScheduling.
func (o *PersistOptions) SetHaltScheduling(halt bool, source string) {
Copy link
Member

@rleungx rleungx May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the source be if we use API to set it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@HuSharp HuSharp May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need SetHaltScheduling function if we have SetSchedulingAllowanceStatus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it's not used anymore, but I think it's okay to leave it there.

@JmPotato JmPotato requested a review from rleungx May 7, 2024 08:18
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 7, 2024
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 8, 2024
@JmPotato
Copy link
Member Author

JmPotato commented May 8, 2024

/merge

Copy link
Contributor

ti-chi-bot bot commented May 8, 2024

@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented May 8, 2024

This pull request has been accepted and is ready to merge.

Commit hash: 4d44354

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 8, 2024
@ti-chi-bot ti-chi-bot bot merged commit 740f15e into tikv:master May 8, 2024
22 checks passed
@JmPotato JmPotato deleted the update_unsafe_recovery_halt branch May 8, 2024 08:48
@JmPotato JmPotato added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label May 8, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #8155.

ti-chi-bot bot pushed a commit that referenced this pull request May 9, 2024
…8147) (#8155)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: JmPotato <[email protected]>
@lhy1024
Copy link
Contributor

lhy1024 commented May 20, 2024

/cherry-pick release-7.1

@lhy1024
Copy link
Contributor

lhy1024 commented May 20, 2024

/cherry-pick release-7.5

@ti-chi-bot
Copy link
Member

@lhy1024: new pull request created to branch release-7.1: #8193.

In response to this:

/cherry-pick release-7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request May 20, 2024
@ti-chi-bot
Copy link
Member

@lhy1024: new pull request created to branch release-7.5: #8194.

In response to this:

/cherry-pick release-7.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request May 20, 2024
ti-chi-bot bot pushed a commit that referenced this pull request May 22, 2024
…8147) (#8194)

ref #6493, close #8095

Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: JmPotato <[email protected]>
Co-authored-by: lhy1024 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/schedule Scheduling logic. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd
5 participants