Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s sets KubeletCredentialProviders feature gate which is is GA and removed in 1.28 #8941

Closed
damonmaria opened this issue Nov 22, 2023 · 9 comments
Assignees
Milestone

Comments

@damonmaria
Copy link

Environmental Info:
K3s Version:

# k3s -v
k3s version v1.28.3+k3s2 (bbafb86e)
go version go1.20.10

Node(s) CPU architecture, OS, and Version:

# uname -a
Linux proc1.anzac.mindhive.lan 5.4.0-167-generic #184-Ubuntu SMP Tue Oct 31 09:21:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
Many clusters. But most: 2 servers, 2-3 agents.

Describe the bug:
As per this k8s PR the feature flag KubeletCredentialProviders has been removed in 1.28. But k3s is still setting it. This appears to happen if either the --image-credential-provider-* args are passed or there is contents in the default /var/lib/rancher/credentialprovider/ directory.

This results in the following failure on startup:

time="2023-11-22T18:22:41+13:00" level=fatal msg="kubelet exited: failed to set feature gates from initial flags-based config: unrecognized feature gate: KubeletCredentialProviders"

Steps To Reproduce:

  1. Installed k3s through the https://get.k3s.io/ script
  2. End result systemd service unit ExecStart is:
ExecStart=/usr/local/bin/k3s \
    server \
	'--datastore-endpoint=https://proc1:2379' \
	'--datastore-cafile=/etc/etcd/ca.pem' \
	'--datastore-certfile=/etc/etcd/client.pem' \
	'--datastore-keyfile=/etc/etcd/client-key.pem' \
	'--flannel-backend=host-gw' \
	'--disable=metrics-server,traefik' \
	'--node-name=proc1' \
	'--resolv-conf=/etc/resolv-k3s.conf' \
	'--node-label' \
	'role=processor' \
	'--node-label' \
	'gpu=true' \
  1. Either add the --image-credential-provider-bin-dir and --image-credential-provider-config arguments or place the appropriate files in /var/lib/rancher/credentialprovider/.
  2. Start k3s service

Expected behavior:
k3s should start and use the credential provider.

Actual behavior:
When starting kubelet it has a fatal error:

time="2023-11-22T18:22:41+13:00" level=fatal msg="kubelet exited: failed to set feature gates from initial flags-based config: unrecognized feature gate: KubeletCredentialProviders"

Additional context / logs:

Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: time="2023-11-22T18:22:41+13:00" level=info msg="Running kubelet --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true,KubeletCredentialProviders=true --healthz-bind-address=127.0.0.1 --hostname-override=proc1 --image-credential-provider-bin-dir=/var/lib/rancher/credentialprovider/bin --image-credential-provider-config=/var/lib/rancher/credentialprovider/config.yaml --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-ip=192.168.5.42 --node-labels=role=processor,gpu=true --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv-k3s.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: time="2023-11-22T18:22:41+13:00" level=info msg="Handling backend connection request [proc1]"
Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: Error: failed to set feature gates from initial flags-based config: unrecognized feature gate: KubeletCredentialProviders
Nov 22 18:22:41 proc1.anzac.mindhive.lan k3s[597591]: time="2023-11-22T18:22:41+13:00" level=fatal msg="kubelet exited: failed to set feature gates from initial flags-based config: unrecognized feature gate: KubeletCredentialProviders"
@brandond
Copy link
Member

brandond commented Nov 22, 2023

Ah, I'd missed that this got removed. That is a bummer, I was really hoping it would mature. I suppose we can close out #3463 then.

I'll have to see what upstream is planning to replace this, maybe the credentials stuff that we originally wanted containerd and the other runtimes to provide will come to pass now that this has failed in the kubelet.

@brandond brandond self-assigned this Nov 22, 2023
@brandond brandond added this to the v1.28.5+k3s1 milestone Nov 22, 2023
@brandond brandond removed this from K3s Backlog Nov 22, 2023
@brandond brandond moved this from New to Next Up in K3s Development Nov 22, 2023
@damonmaria
Copy link
Author

@brandond I noticed the discussion on another thread about the pause container, is that what you're referring to regarding not being "mature"?

Our use case is we need to pull some containers from our own private AWS ECR registry. I presume that won't be an issue with the current capabilities?

@brandond
Copy link
Member

brandond commented Nov 22, 2023

That was the largest unclosed gap that I saw at the time. I suspect that there wasn't anything specifically wrong with the feature, just that whoever was pushing it forward moved on or lost interest, and it got swept out by the policy of removing features that fail to graduate.

That's my take on it, without having checking in on any of the upstream work in quite a while.

You can still embed creds in registries.yaml, or use image pull creds in the pod spec, but those are all static - which isn't great for EC2 or other environments where you have ambient credentials available.

@damonmaria
Copy link
Author

We have constantly rotating AWS creds on our machines (outside of AWS) and so the current credential provider process looks like it should be perfect for us. Currently we have our own k8s cronjob that generates new secrets and stores them to be used as imagePullSecrets. But that's a pretty messy solution.

@brandond
Copy link
Member

Yeah, that was what the credential provider was supposed to handle, and why I'm disappointed to see it fail to mature. Previously we had proposed to have containerd and other runtimes supporting credential providers natively, but they deferred to the kubelet/CRI folks which lead to the current/former solution.

@brandond brandond moved this from Next Up to Peer Review in K3s Development Nov 28, 2023
@aii-nozomu-oki
Copy link

kubernetes/kubernetes#116901 is only a deletion from feature gate, the feature itself exists as GA, right?
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-credential-provider/

I'm using kubelet credential provider with K3s 1.27 to pull images from ECR.

@brandond
Copy link
Member

brandond commented Nov 29, 2023

You know I guess I was just assuming it got removed, if it went GA and is locked on that's much better. Thanks for pointing that out!

I still have issues with the implementation, but I'm glad it's not gone.

@brandond brandond changed the title k3s sets KubeletCredentialProviders feature gate which is removed in 1.28 k3s sets KubeletCredentialProviders feature gate which is is GA and removed in 1.28 Nov 29, 2023
@damonmaria
Copy link
Author

@rancher-max
Copy link
Contributor

Validated in both v1.29.0-rc2+k3s1 and v1.29.0-rc1+rke2r1

$ journalctl -eu k3s | grep -i "feature-gate"
Dec 20 19:55:54 ip-172-31-8-96 k3s[14115]: time="2023-12-20T19:55:54Z" level=info msg="Running cloud-controller-manager <...removed for brevity...> --feature-gates=CloudDualStackNodeIPs=true <...removed for brevity...>"
Dec 20 19:55:58 ip-172-31-8-96 k3s[14115]: time="2023-12-20T19:55:58Z" level=info msg="Running kubelet <...removed for brevity...> --feature-gates=CloudDualStackNodeIPs=true <...removed for brevity...>"
$ journalctl -eu k3s | grep -i "JobTrackingWithFinalizers"
(no result)
$ journalctl -eu k3s | grep -i "KubeletCredentialProviders"
(no result)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants