-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix statefulset PreSync
to work for certain cases of unhealthy etcd clusters upon druid upgrade
#823
Conversation
@shreyas-s-rao Command "/InviteCommand" failed with "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the gardener/etcd-druid repository.". Additional Information
|
@shreyas-s-rao Command "/InviteCommand" failed with "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the gardener/etcd-druid repository.". Additional Information
|
/assign |
…y are updated or ready, similar to gardener#823
/assign |
…which includes #777 (#804) * Add new labels to sts pods, for backward compatibility with v0.23.0 which includes #777 * Fix unit tests for status.members checker * Address review comment by @anveshreddy18; check exact match for sts label selector for sts recreation * Statefulset PreDeploy now only checks pod labels, and not whether they are updated or ready, similar to #823 * Add comments for `PreDeploy` methods
… clusters upon druid upgrade
@seshachalam-yv thanks for your review. Your comments are addressed now. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
If it's isn't waiting for pod to be |
we usually never go from TLS to non-TLS, Is there any other case where we want to do this ? |
Had a offline discussion with @shreyas-s-rao and he informed me that he didn't see the transient quorum loss while testing a upgrading from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
@shreyas-s-rao Command "/InviteCommand" failed with "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the gardener/etcd-druid repository.". Additional Information
|
I've added a |
@shreyas-s-rao Command "/InviteCommand" failed with "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the gardener/etcd-druid repository.". Additional Information
|
1 similar comment
@shreyas-s-rao Command "/InviteCommand" failed with "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the gardener/etcd-druid repository.". Additional Information
|
How to categorize this PR?
/area quality
/kind bug
What this PR does / why we need it:
This PR fixes statefulset
PreSync
behavior, to no longer wait for etcd pods to be updated with the latest sts spec, or to be ready. Instead,PreSync
will now simply check whether the pods have the expected labels, and continue with the next steps. This fixes certain cases where an etcd cluster is unhealthy due to a wrong reconfiguration of the etcd configmap, causing one of the three etcd pods to fail, which causes thePreSync
step to get stuck in the pod updated+ready check, and not allow druid to runSync
which reconciles and potentially fixes the issue by reverting the configmap to the correct state.Which issue(s) this PR fixes:
Fixes #818
Special notes for your reviewer:
While working on this fix, @unmarshall and I found certain cases where PreSync cannot succeed. Consider the following case:
Operation: Reconcile, State: Failed
.Reconcile
operation asFailed
, and an operator will need to manually look into it and fix it by performing a recovery from quorum loss using this guide./invite @acumino
Release note: