-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 CAEP to add support for managed external etcd clusters in CAPI #4659
Conversation
Hi @mrajashree. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments as someone new to CAPI, but those shouldn't be a blocker. The overall logic makes sense.
833b8d9
to
327061c
Compare
060ab54
to
54a1a75
Compare
InitMachineAddress string `json:"initMachineAddress"` | ||
|
||
// +optional | ||
Initialized bool `json:"initialized"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get the difference between this flag and the EtcdInitializedCondition. Is it possible to dedup?
- Once the upgrade/machine replacement is completed, the etcd controller will again update the EtcdadmCluster.Status.Endpoints field. The control plane controller will rollout new Machines that use the updated etcd endpoints for the apiserver. | ||
|
||
###### Periodic etcd member healthchecks | ||
- The etcdadm cluster controller will perform healthchecks on the etcd members periodically at a predetermined interval. The controller will perform client healthchecks by making HTTP Get requests to the `<etcd member address>:2379/health` endpoint of each etcd member. It cannot perform peer-to-peer healthchecks since the controller is external to the etcd cluster and there won't be any agents running on the etcd members. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This inverts the current pattern with MHC responsible for checking machines health.
I'm ok with this, but I would like to hear other opinions on this...
|
||
#### Changes needed in docker machine controller | ||
- Each machine infrastructure provider sets a providerID on every InfraMachine. | ||
- Infrastructure providers like CAPA and CAPV set this provider ID based on the instance metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong this is a responsibility of the CPI controllers.
#### Changes needed in docker machine controller | ||
- Each machine infrastructure provider sets a providerID on every InfraMachine. | ||
- Infrastructure providers like CAPA and CAPV set this provider ID based on the instance metadata. | ||
- CAPD on the other hand first calls `kubectl patch` to add the provider ID on the node. And only then it sets the provider ID value on the DockerMachine resource. But the problem is that in case of an etcd only cluster, the machine is not registered as a Kubernetes node. Cluster provisioning does not progress until this provider ID is set on the Machine. As a solution, CAPD will check if the bootstrap provider is etcdadm, and skip the kubectl patch process and set the provider ID on DockerMachine directly. These changes are required in the [docker machine controller](https://github.com/mrajashree/cluster-api/pull/2/files#diff-1923dd8291a9406e3c144763f526bd9797a2449a030f5768178b8d06d13c795bR307) for CAPD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on this change.
Each provider should be sandboxed and rely only on informations available on CAPI resources (vs checking combinations with other providers). If with this proposal we are introducing the concept of machine without a node, let's find a way to make this is first class construct in CAPI core. This can eventually be done in staged approaches avoiding breaking changes in the beginning, but at the end we should provide the users a way to discriminate the different types of machines.
EtcdBootstrapProviderType = ProviderType("EtcdBootstrapProvider") | ||
EtcdProvider = ProviderType("EtcdProvider") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This impact the operator work as well... At least let's state as a future goal the intent to update the operator proposal to accommodate those providers as well
The Etcdadm based provider requires two separate provider types, other etcd providers that get added later on might not require two separate providers, so they can utilize just the EtcdProvider type. | ||
With these changes, user should be able to install the etcd provider by running | ||
``` | ||
clusterctl init --etcdbootstrap etcdadm-bootstrap-provider --etcdprovider etcdadm-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: adding new providers types has a ripple effect on several clusterctl commands, not only init (e.g upgrade, generate providers etc)
@mrajashree: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
hey @mrajashree ^^ is there any update regarding this KEP? |
@dntosas I'll update the kep soon |
@mrajashree Are we still trying to pursue this proposal? What's its status? |
@vincepri yes I'd like to continue with this, I'll update this by next week |
@mrajashree Any updates on this proposal? |
@g-gaston mentioned during the 06/08 meeting that they'd be taking over this proposal and collaborate with @mrajashree on next steps |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/close |
@fabriziopandini: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@fabriziopandini I'm interested in picking this up |
Good to hear that, what about re-starting with a new issue stating the goal of this effort + a new PR proposal, I think that we this discussion could benefit from a fresh start |
Yeah, sounds good. I'll write up an issue an open a new PR referencing this one. |
What this PR does / why we need it:
This PR adds a KEP to support managed external etcd clusters within CAPI.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):#4658