Single node cluster (controlplane) in-place upgrade #7415

furkatgofurov7 · 2022-10-17T13:07:19Z

User Story

As a developer/user/operator I have a use case where single-server installations are desirable (i.e edge, base stations and small regional clouds). To clarify, by single-server, we mean using a single Kubernetes control plane node in the cluster (no worker nodes), which runs workloads on them (properly tainted).
Then, I would like to perform an in-place (an upgrade strategy that provides the possibility to upgrade a node in place without the current OR old node being removed or any new node being created) upgrade of that single control plane node.

Detailed Description

The ultimate goal is, we would like to know what problems/challenges will be faced during the in-place upgrade process of single-server installation. We can assume that service interruption is okay, meaning workloads running on the control plane being down is bearable. However, we assume that removing the one and only control plane node from the cluster during the upgrade will break the whole process since subsequent commands toward the API server will be failing.

Anything else you would like to add:

In-place upgrades have been discussed in the past and even the draft proposal(https://docs.google.com/document/d/1odiy0k_KZngdhidN_ll9Mb8WgGUR9iMFU7NfYRZKCvA/edit?pli=1) is up. However, seems like it does cover only the upgrade of the worker nodes (honoring maxUnavailable)?
In any case, we would like to know if our use case can be considered to be covered in the same proposal or maybe there are already possible ways (I have really low hopes on that) we can achieve the above-provided use case with the current state of the Cluster API

/kind feature

#7415 (comment)

furkatgofurov7 · 2022-10-17T13:08:54Z

/cc @kashifest
/cc @enxebre @fabriziopandini @sbueringer

enxebre · 2022-10-17T13:26:51Z

Thanks for creating this @furkatgofurov7! Let's try to define and separate things into different areas:

Re: Single server, as per your use case let's call this "single Node cluster". A single server could be a ControlPlane implementation that runs the control plane as pods in a management cluster or such but does not necessarily expose a Node of any kind. However in your description seems that the infra running the kas operates as Node itself.

However, seems like it does cover only the upgrade of the worker nodes (honoring maxUnavailable)?

Starting with workers seems reasonable to get things going. However we should be able to eventually come up with a proposal where controlPlane implementations could take advantage of a common in-place upgrade logic.

Could you add your user story to the gdoc so we keep collecting info there?

furkatgofurov7 · 2022-10-17T13:41:41Z

Thanks for quick reply @enxebre

Re: Single server, as per your use case let's call this "single Node cluster". A single server could be a ControlPlane implementation that runs the control plane as pods in a management cluster or such but does not necessarily expose a Node of any kind. However in your description seems that the infra running the kas operates as Node itself.However in your description seems that the infra running the kas operates as Node itself.

Yes, it is the baremetal under the hood backing up the node itself.

Edit: I have changed the title of the issue to suit this use case better, thanks for the suggestion

Could you add your user story to the gdoc so we keep collecting info there?

Absolutely, will add that as a separate use case to the user story in the proposal

Starting with workers seems reasonable to get things going. However we should be able to eventually come up with a proposal where controlPlane implementations could take advantage of a common in-place upgrade logic.

Agree, having control plane implementation needs in mind during the worker-in-place design makes sense to me.

dlipovetsky · 2022-11-14T18:17:02Z

Has anyone demonstrated that an in-place upgrade of a single control plane is even possible in a way that is supported by upstream? (For example, node drain is a required step of an upgrade. Is that a problem?)

Before we ask Cluster API to support this use case, I think we either have to demonstrate that it is possible, or understand (and address) the upstream issues that make it impossible.

fabriziopandini · 2022-12-28T11:19:52Z

/triage accepted
This is a very interesting topic, even if the problem space is complex and I'm not sure if an issue is a right approach to drive the discussion forward or if it is better to move the discussion to a document like https://docs.google.com/document/d/1odiy0k_KZngdhidN_ll9Mb8WgGUR9iMFU7NfYRZKCvA/edit?pli=1 or a separated one

k8s-triage-robot · 2024-01-20T01:08:45Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini · 2024-04-12T13:46:50Z

/priority backlog

fabriziopandini · 2024-04-30T13:23:08Z

@g-gaston
q: is this issue in the scope of the in place upgrade working group?

g-gaston · 2024-04-30T20:47:13Z

@g-gaston

q: is this issue in the scope of the in place upgrade working group?

Yeah, it is!

fabriziopandini · 2024-05-02T09:55:44Z

/triage accepted
/assign @g-gaston

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 17, 2022

furkatgofurov7 changed the title ~~Single server in-place upgrade~~ Single node cluster (controlplane) in-place upgrade Oct 17, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 28, 2022

dharmjit mentioned this issue Sep 25, 2023

Supporting an Inplace Update Rollout Strategy for upgrading Workload Clusters #9489

Open

neolit123 mentioned this issue Nov 2, 2023

Cluster API doesn't directly support certificate renewal #9662

Closed

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 12, 2024

k8s-ci-robot assigned g-gaston May 2, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single node cluster (controlplane) in-place upgrade #7415

Single node cluster (controlplane) in-place upgrade #7415

furkatgofurov7 commented Oct 17, 2022 •

edited

Loading

furkatgofurov7 commented Oct 17, 2022

enxebre commented Oct 17, 2022

furkatgofurov7 commented Oct 17, 2022 •

edited

Loading

dlipovetsky commented Nov 14, 2022

fabriziopandini commented Dec 28, 2022 •

edited

Loading

k8s-triage-robot commented Jan 20, 2024

fabriziopandini commented Apr 12, 2024

fabriziopandini commented Apr 30, 2024

g-gaston commented Apr 30, 2024

fabriziopandini commented May 2, 2024

Single node cluster (controlplane) in-place upgrade #7415

Single node cluster (controlplane) in-place upgrade #7415

Comments

furkatgofurov7 commented Oct 17, 2022 • edited Loading

furkatgofurov7 commented Oct 17, 2022

enxebre commented Oct 17, 2022

furkatgofurov7 commented Oct 17, 2022 • edited Loading

dlipovetsky commented Nov 14, 2022

fabriziopandini commented Dec 28, 2022 • edited Loading

k8s-triage-robot commented Jan 20, 2024

fabriziopandini commented Apr 12, 2024

fabriziopandini commented Apr 30, 2024

g-gaston commented Apr 30, 2024

fabriziopandini commented May 2, 2024

furkatgofurov7 commented Oct 17, 2022 •

edited

Loading

furkatgofurov7 commented Oct 17, 2022 •

edited

Loading

fabriziopandini commented Dec 28, 2022 •

edited

Loading