Improve how do we determine the control plane machine to be remediated #3845

fabriziopandini · 2020-10-22T08:45:04Z

#3830 introduces KCP remediation, and according to the proposal, in case of more than one unhealthy machine we are picking up the oldest one for remediation.

As commented in reconcileUnhealthyMachines, the current solution is considered acceptable for the most frequent use case (only one unhealthy machine), however, in the future, this could potentially be improved for the scenario where more than one unhealthy machine exists by considering which machine has a lower impact on etcd quorum.

This effort will provide better support for the following use case:

two or more CP nodes are marked unhealthy
at least another (healthy) machine has a failing etcd member

/kind feature

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2020-10-22T08:50:38Z

/milestone v0.4.0

fabriziopandini · 2020-11-12T21:43:57Z

/area control-plane
/priority backlog

fejta-bot · 2021-02-10T22:04:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

vincepri · 2021-02-18T16:06:54Z

/lifecycle frozen

vincepri · 2021-10-19T14:43:30Z

/milestone v1.0

fabriziopandini · 2022-09-30T19:41:16Z

/close
given that we don't have evidence of problems due to the current rule

k8s-ci-robot · 2022-09-30T19:41:19Z

@fabriziopandini: Closing this issue.

In response to this:

/close
given that we don't have evidence of problems due to the current rule

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 22, 2020

fabriziopandini changed the title ~~Improve how do we s~~ Improve how do we determine the control plane machine to be remediated Oct 22, 2020

fabriziopandini mentioned this issue Oct 22, 2020

✨ KCP remediates unhealthy machines #3830

Merged

k8s-ci-robot added this to the v0.4.0 milestone Oct 22, 2020

k8s-ci-robot added area/control-plane Issues or PRs related to control-plane lifecycle management priority/backlog Higher priority than priority/awaiting-more-evidence. labels Nov 12, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2021

k8s-ci-robot modified the milestones: v0.4, v1.0 Oct 19, 2021

vincepri modified the milestones: v1.0, v1.1 Oct 22, 2021

fabriziopandini modified the milestones: v1.1, v1.2 Feb 3, 2022

fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

fabriziopandini removed this from the v1.2 milestone Jul 29, 2022

fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

k8s-ci-robot closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve how do we determine the control plane machine to be remediated #3845

Improve how do we determine the control plane machine to be remediated #3845

fabriziopandini commented Oct 22, 2020 •

edited

Loading

fabriziopandini commented Oct 22, 2020

fabriziopandini commented Nov 12, 2020

fejta-bot commented Feb 10, 2021

vincepri commented Feb 18, 2021

vincepri commented Oct 19, 2021

fabriziopandini commented Sep 30, 2022

k8s-ci-robot commented Sep 30, 2022

Improve how do we determine the control plane machine to be remediated #3845

Improve how do we determine the control plane machine to be remediated #3845

Comments

fabriziopandini commented Oct 22, 2020 • edited Loading

fabriziopandini commented Oct 22, 2020

fabriziopandini commented Nov 12, 2020

fejta-bot commented Feb 10, 2021

vincepri commented Feb 18, 2021

vincepri commented Oct 19, 2021

fabriziopandini commented Sep 30, 2022

k8s-ci-robot commented Sep 30, 2022

fabriziopandini commented Oct 22, 2020 •

edited

Loading