Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single node cluster (controlplane) in-place upgrade #7415

Open
furkatgofurov7 opened this issue Oct 17, 2022 · 10 comments
Open

Single node cluster (controlplane) in-place upgrade #7415

furkatgofurov7 opened this issue Oct 17, 2022 · 10 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@furkatgofurov7
Copy link
Member

furkatgofurov7 commented Oct 17, 2022

User Story

As a developer/user/operator I have a use case where single-server installations are desirable (i.e edge, base stations and small regional clouds). To clarify, by single-server, we mean using a single Kubernetes control plane node in the cluster (no worker nodes), which runs workloads on them (properly tainted).
Then, I would like to perform an in-place (an upgrade strategy that provides the possibility to upgrade a node in place without the current OR old node being removed or any new node being created) upgrade of that single control plane node.

Detailed Description

The ultimate goal is, we would like to know what problems/challenges will be faced during the in-place upgrade process of single-server installation. We can assume that service interruption is okay, meaning workloads running on the control plane being down is bearable. However, we assume that removing the one and only control plane node from the cluster during the upgrade will break the whole process since subsequent commands toward the API server will be failing.

Anything else you would like to add:

In-place upgrades have been discussed in the past and even the draft proposal(https://docs.google.com/document/d/1odiy0k_KZngdhidN_ll9Mb8WgGUR9iMFU7NfYRZKCvA/edit?pli=1) is up. However, seems like it does cover only the upgrade of the worker nodes (honoring maxUnavailable)?
In any case, we would like to know if our use case can be considered to be covered in the same proposal or maybe there are already possible ways (I have really low hopes on that) we can achieve the above-provided use case with the current state of the Cluster API

/kind feature

#7415 (comment)

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 17, 2022
@furkatgofurov7
Copy link
Member Author

@enxebre
Copy link
Member

enxebre commented Oct 17, 2022

Thanks for creating this @furkatgofurov7! Let's try to define and separate things into different areas:

Re: Single server, as per your use case let's call this "single Node cluster". A single server could be a ControlPlane implementation that runs the control plane as pods in a management cluster or such but does not necessarily expose a Node of any kind. However in your description seems that the infra running the kas operates as Node itself.

However, seems like it does cover only the upgrade of the worker nodes (honoring maxUnavailable)?

Starting with workers seems reasonable to get things going. However we should be able to eventually come up with a proposal where controlPlane implementations could take advantage of a common in-place upgrade logic.

Could you add your user story to the gdoc so we keep collecting info there?

@furkatgofurov7
Copy link
Member Author

furkatgofurov7 commented Oct 17, 2022

Thanks for quick reply @enxebre

Re: Single server, as per your use case let's call this "single Node cluster". A single server could be a ControlPlane implementation that runs the control plane as pods in a management cluster or such but does not necessarily expose a Node of any kind. However in your description seems that the infra running the kas operates as Node itself.However in your description seems that the infra running the kas operates as Node itself.

Yes, it is the baremetal under the hood backing up the node itself.

Edit: I have changed the title of the issue to suit this use case better, thanks for the suggestion

Could you add your user story to the gdoc so we keep collecting info there?

Absolutely, will add that as a separate use case to the user story in the proposal

Starting with workers seems reasonable to get things going. However we should be able to eventually come up with a proposal where controlPlane implementations could take advantage of a common in-place upgrade logic.

Agree, having control plane implementation needs in mind during the worker-in-place design makes sense to me.

@furkatgofurov7 furkatgofurov7 changed the title Single server in-place upgrade Single node cluster (controlplane) in-place upgrade Oct 17, 2022
@dlipovetsky
Copy link
Contributor

Has anyone demonstrated that an in-place upgrade of a single control plane is even possible in a way that is supported by upstream? (For example, node drain is a required step of an upgrade. Is that a problem?)

Before we ask Cluster API to support this use case, I think we either have to demonstrate that it is possible, or understand (and address) the upstream issues that make it impossible.

@fabriziopandini
Copy link
Member

fabriziopandini commented Dec 28, 2022

/triage accepted
This is a very interesting topic, even if the problem space is complex and I'm not sure if an issue is a right approach to drive the discussion forward or if it is better to move the discussion to a document like https://docs.google.com/document/d/1odiy0k_KZngdhidN_ll9Mb8WgGUR9iMFU7NfYRZKCvA/edit?pli=1 or a separated one

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 28, 2022
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024
@fabriziopandini
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 12, 2024
@fabriziopandini
Copy link
Member

@g-gaston
q: is this issue in the scope of the in place upgrade working group?

@g-gaston
Copy link
Contributor

@g-gaston

q: is this issue in the scope of the in place upgrade working group?

Yeah, it is!

@fabriziopandini
Copy link
Member

/triage accepted
/assign @g-gaston

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants