Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve how do we determine the control plane machine to be remediated #3845

Closed
fabriziopandini opened this issue Oct 22, 2020 · 7 comments
Closed
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@fabriziopandini
Copy link
Member

fabriziopandini commented Oct 22, 2020

#3830 introduces KCP remediation, and according to the proposal, in case of more than one unhealthy machine we are picking up the oldest one for remediation.

As commented in reconcileUnhealthyMachines, the current solution is considered acceptable for the most frequent use case (only one unhealthy machine), however, in the future, this could potentially be improved for the scenario where more than one unhealthy machine exists by considering which machine has a lower impact on etcd quorum.

This effort will provide better support for the following use case:

  • two or more CP nodes are marked unhealthy
  • at least another (healthy) machine has a failing etcd member

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 22, 2020
@fabriziopandini fabriziopandini changed the title Improve how do we s Improve how do we determine the control plane machine to be remediated Oct 22, 2020
@fabriziopandini
Copy link
Member Author

/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Oct 22, 2020
@fabriziopandini
Copy link
Member Author

/area control-plane
/priority backlog

@k8s-ci-robot k8s-ci-robot added area/control-plane Issues or PRs related to control-plane lifecycle management priority/backlog Higher priority than priority/awaiting-more-evidence. labels Nov 12, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2021
@vincepri
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2021
@vincepri
Copy link
Member

/milestone v1.0

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.4, v1.0 Oct 19, 2021
@vincepri vincepri modified the milestones: v1.0, v1.1 Oct 22, 2021
@fabriziopandini fabriziopandini modified the milestones: v1.1, v1.2 Feb 3, 2022
@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini
Copy link
Member Author

/close
given that we don't have evidence of problems due to the current rule

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
given that we don't have evidence of problems due to the current rule

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

4 participants