-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PD failover #2570
Fix PD failover #2570
Conversation
pkg/manager/member/pd_failover.go
Outdated
for podName, pdMember := range tc.Status.PD.Members { | ||
if !podNames.Has(podName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update tryToMarkAPeerAsFailure
instead of here, why ignore the non-managed members in the cluster health check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok for me. updated.
pkg/manager/member/pd_failover.go
Outdated
for podName, pdMember := range tc.Status.PD.Members { | ||
if !podNames.Has(podName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep consistent with tikv handling here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
@@ -125,6 +125,12 @@ func (pf *pdFailover) tryToMarkAPeerAsFailure(tc *v1alpha1.TidbCluster) error { | |||
if pdMember.LastTransitionTime.IsZero() { | |||
continue | |||
} | |||
if !pf.isPodDesired(tc, podName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also delete the non desired members in the failure members as done in https://github.com/pingcap/tidb-operator/pull/2560/files#diff-cff8f7143431f3d6302182e300a81909R58?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's necessary right now since all failure members will be cleared. anyway, we can revise it later.
Co-authored-by: Yecheng Fu <[email protected]>
/merge |
Your auto merge job has been accepted, waiting for:
|
/run-all-tests |
/run-cherry-picker |
Signed-off-by: sre-bot <[email protected]>
cherry pick to release-1.1 in PR #2577 |
What problem does this PR solve?
Limit the PD Member in checking PD failover. If the PD Member didn't manage by the operator, the operator wouldn't failover if the unmanaged pd member failed.
Does this PR introduce a user-facing change?: