-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] 📖 External etcd cluster lifecycle support #7525
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @g-gaston. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- A [`runcmd` section](https://cloudinit.readthedocs.io/en/latest/topics/modules.html#runcmd) containing the etcdadm commands, along with any user specified commands. | ||
The controller then saves this cloud-init script as a Secret, and this Secret's name on the EtcdadmConfig.Status.DataSecretName field. | ||
- Since the EtcdadmConfig.Status.DataSecretName field gets set as per the [bootstrap provider specifications](https://cluster-api.sigs.k8s.io/developer/providers/bootstrap.html), the infrastructure providers will use the data from this Secret to initiate the Machines with the cloud-init script. | ||
> TODO: the current implementation today supports [Bottlerocket](https://github.com/bottlerocket-os/bottlerocket) as well, document it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might trigger its own conversation due to the significant difference between bottlerocket and other OS supporting cloud-init.
It doesn't need to be part of this proposal but, since we will follow with the implementation, it's probably worth noting. I'll just document it here and we'll go from there.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
A provider that creates and manages an etcd cluster to be used by a single workload cluster for the [external etcd topology](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/#external-etcd-topology). | ||
|
||
### Etcdadm based etcd provider | ||
This provider is an implementation of the Etcd provider that uses [etcdadm](https://github.com/kubernetes-sigs/etcdadm) to create and manage an external etcd cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last release was in 2021, is this still maintained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is. I myself got a couple PRs after that 2021 release. I suspect no one has asked for a release and that's why it hasn't been cut. But I can reach out to get one if we move ahead with this.
- An etcd provider should support etcd clusters of any size (odd numbers only), including single member etcd clusters for testing purposes. But the documentation should specify the recommended cluster size for production use cases to range between 3 and 7. | ||
- Provide a first implementation of this pluggable etcd provider using [etcdadm](https://github.com/kubernetes-sigs/etcdadm) that integrates with the Kubeadm Control Plane provider. | ||
- The etcdadm based provider will utilize the existing Machine objects to represent etcd members for convenience instead of adding a new machine type for etcd. | ||
- Support the following etcd cluster management actions: scale up and scale down, etcd member replacement and etcd version upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add that we need some code reuse between this effort and KCP's current etcd capabilities? Ultimately we should try to maximize our shared code between the external etcd provider and stacked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. TBH I would prefer to leave that out of the design and move that responsibility to the implementation phase. It's also something we can iterate on without affecting the design.
However if you feel strongly about this, what about:
- Support the following etcd cluster management actions: scale up and scale down, etcd member replacement and etcd version upgrades. | |
- Support the following etcd cluster management actions: scale up and scale down, etcd member replacement and etcd version upgrades. | |
- Minimize code duplication with KCP's current etcd capabilities by sharing code with the external etcd providers. |
- There will be a 1:1 mapping between the external etcd cluster and the workload cluster. | ||
- An etcd provider should support etcd clusters of any size (odd numbers only), including single member etcd clusters for testing purposes. But the documentation should specify the recommended cluster size for production use cases to range between 3 and 7. | ||
- Provide a first implementation of this pluggable etcd provider using [etcdadm](https://github.com/kubernetes-sigs/etcdadm) that integrates with the Kubeadm Control Plane provider. | ||
- The etcdadm based provider will utilize the existing Machine objects to represent etcd members for convenience instead of adding a new machine type for etcd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a limitation currently in that the Machine controller always expects a Machine to become a Kubernetes node; this proposal would allow a different set of nodes to stand as Machine without a corresponding Node. We might want to add a paragraph here that states the above clarifying that Machine(s) that won't become nodes will only be for etcd, which is a support pillar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add this.
The following new type/CRDs will be added for the etcdadm based provider | ||
```go | ||
// API Group: etcd.cluster.x-k8s.io OR etcdplane.cluster.x-k8s.io | ||
type EtcdadmCluster struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid relying on etcdadm
here, and just call it etcd
? Reason being, if in the future we'd like to swap out the implementation detail, we should be able to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's an interesting point. However, wouldn't that almost guarantee another provider? Similarly to if we ever want to swap kubeadm with a different tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to replace etcdadm based provider
with etcdadm based etcd provider
to avoid confusion. IMO calling etcdadm
here would be fine because that's one of the reference implementations of the etcd provider.
// +optional | ||
EtcdadmConfigSpec etcdbp.EtcdadmConfigSpec `json:"etcdadmConfigSpec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This imports a config that we have no control over, could we instead offer a translation layer between what we want to expose to Cluster API users, and have etcdadm be an implementation detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you are saying. But that sounds like a paradigm change, which I'm not opposed to, but it does change the high level approach slightly. If I'm getting this right, you are suggesting making this API almost like a generic interface for an etcd provider, as opposed to the API for a concrete implementation using etcdadm (which is what this design tries to do). This design follows the same idea as the CP provider and the KCP, where the KCP API is specific to kubeadm.
What's the benefit you see on doing this? Or is it more that you have a concern with etcdadm.
// +optional | ||
Initialized bool `json:"initialized"` | ||
|
||
// +optional | ||
CreationComplete bool `json:"ready"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would agree. Let me take a note and revisit this.
Thanks for the review @vincepri! |
@g-gaston: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@g-gaston what is the plan for this PR? |
Yeah, agreed. I still have interest in moving this forward but I don't have the bandwidth to make enough progress right now :(. If someone is interested in collaborating, please reach out. If not, I'll reopen once me or someone else on the eks-a side starts working on this again. |
What this PR does / why we need it:
This proposal continues the work started by #4659
Tracking TODOs
Fixes #7399