-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2084450: Add unit/file for AWS to compute instance provider-id and pass it to the kubelet #3162
Conversation
Skipping CI for Draft Pull Request. |
templates/common/aws/files/usr-local-bin-aws-kubelet-providerid.yaml
Outdated
Show resolved
Hide resolved
- What I did I Added AWS specific systemd unit (aws-kubelet-providerid.service) and file (/usr/local/bin/aws-kubelet-providerid) for generating the AWS instance provider-id (then stored in the KUBELET_PROVIDERID env var), in order to pass it as the --provider-id argument to the kubelet service binary. We needed to add such flag, and make it non-empty only on AWS, to make the node syncing (specifically backing instance detection) work via provider-id detection, to cover cases where the node hostname doesn't match the expected private-dns-name (e.g. when a custom DHCP Option Set with empty domain-name is used). Should fix: https://bugzilla.redhat.com/show_bug.cgi?id=2084450 Reference to an upstream issue with context: kubernetes/cloud-provider-aws#384 - How to verify it Try the reproduction steps available at: https://bugzilla.redhat.com/show_bug.cgi?id=2084450#c0 while launching a cluster with this MCO PR included. Verify that the issue is not reproducible anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'd like to see what @sinnykumari and @mdbooth think
lgtm |
@@ -40,6 +40,7 @@ contents: | | |||
--volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ | |||
{{cloudConfigFlag . }} \ | |||
--hostname-override=${KUBELET_NODE_NAME} \ | |||
--provider-id=${KUBELET_PROVIDERID} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, since you are adding this to all platforms, how does this affect non-AWS platforms? Would this be empty? Does that cause issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be empty on all platforms aside from AWS.
And judging by the behaviour of the kubelet, this is only assigned to node.Spec.ProviderID
if is not empty ("").
https://github.com/kubernetes/kubernetes/blob/39c76ba2edeadb84a115cc3fbd9204a2177f1c28/pkg/kubelet/kubelet_node_status.go#L374
So it should be pretty safe.
@damdo: This pull request references Bugzilla bug 2084450, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
For viz since this changes kubelet templates: |
I'm definitely in favour of this, and whatever approach we go for I will aim to add it to OpenStack. This is a much more robust way for CCM to initially match a Node to a kubelet. Otherwise we're left matching the instance name to the node name. As we've seen, node name is fraught with edge cases so a simple UUID match is very attractive. API-wise this just means that kubelet will add the ProviderID to the initialnode instead of CCM doing it after the initial fuzzy match by node name, so the end result is the same but without the initial edge cases. The only thing that stopped me doing this last time was that we discussed adding this to the kubelet config file which was going to be a much more involved change in MCO. I would like to see us using providerID in kubelet before CCM is GA, so I would also be in favour of merging this simpler approach quickly and tackling the larger task of moving to a config file separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
@damdo: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Pr looks good, still would like to get a review from node team since they own the kubelet part |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damdo, mdbooth, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@damdo: All pull requests linked via external trackers have merged: Bugzilla bug 2084450 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
don't know why bot thought lgtm means add approved label :/ |
I think this broke stuff. That's from the cloud-network-config controller pod:
That's from the cloud-network-config-controller's code - before it looked like this:
But, this changed now to look like this (number of slashes):
Likely culprit: #3162 | https://bugzilla.redhat.com/show_bug.cgi?id=2084450 |
@andreaskaris I think you are right.
|
I'm currently filing a bug. But if we both agree, then let's fix this in the mco (instead of adjusting the cncc) :-) I already have a fixup ready, but I'm also o.k. if you post it yourself |
the change in openshift#3162 is missing a third `/` in the `aws:///` schema. This fixes it.
@andreaskaris Was about to open a PR, you beat me to it. Thanks for opening a bug and filing a fix! |
Revert "Merge pull request #3162 from damdo/BZ2084450-2"
…0-2"" This reverts commit 706fbaf.
- What I did
I Added AWS specific systemd unit (
aws-kubelet-providerid.service
) and file (/usr/local/bin/aws-kubelet-providerid
) for generating the AWS instance provider-id (then stored in theKUBELET_PROVIDERID
env var), in order to pass it as the--provider-id
argument to the kubelet service binary.We needed to add such flag, and make it non-empty only on AWS, to make the node syncing (specifically backing instance detection) work via provider-id detection, to cover cases where the node hostname doesn't match the expected
private-dns-name
(e.g. when a custom DHCP Option Set with emptydomain-name
is used).Should fix: https://bugzilla.redhat.com/show_bug.cgi?id=2084450
Reference to an upstream issue with context: kubernetes/cloud-provider-aws#384
- How to verify it
Try the reproduction steps available at: https://bugzilla.redhat.com/show_bug.cgi?id=2084450#c0 while launching a cluster with this MCO PR included.
Verify that the issue is not reproducible anymore.
- Description for the changelog
Add systemd units/files for AWS specific kubelet service
I tested this manually and here's what I got:
machine.machine.openshift.io/ddonati-test111-tlq5x-worker-eu-central-1c-tln2r
is correctly Provisioned and goes into Running statenode/ip-10-0-219-246
is created for the machine, which is then synced up and linked correctly