Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at "Waiting for Kubernetes API" #3545

Open
PatMis16 opened this issue Dec 5, 2022 · 6 comments
Open

Stuck at "Waiting for Kubernetes API" #3545

PatMis16 opened this issue Dec 5, 2022 · 6 comments
Assignees
Labels

Comments

@PatMis16
Copy link

PatMis16 commented Dec 5, 2022

Actual Behavior

Rancher Desktop is installed on Windows 10 Enterprise (21H2). Since it is a corporate workplace, I run px proxy (https://github.com/genotrance/px) to make it easier to use a the corporate proxy on WSL2. With px proxy running, I can use http://localhost:3128 as proxy. This concept also works fine for Docker Desktop. However, I would like to replace Docker Desktop with Rancher Desktop. On Windows the environment variables HTTP_PROXY and HTTPS_PROXY are set to http://localhost:3128. And I have also set the environment variable NO_PROXY with the values localhost and 127.0.0.1.

When starting Rancher Desktop it works fine until "Waiting for Kubernetes API" apears in the UI. It then waits infite for the API to be reachable.

When looking at the logs, we clearly see that there are connectivity issues.

k8s.log:
2022-12-05T14:18:20.906Z: Waiting for K3s server to be ready on port 6443...

k3s.log:
E1205 14:18:26.892050 410 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: proxy error from 127.0.0.1:6443 while dialing <some ip>:10250, code 503: 503 Service Unavailable , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]

Steps to Reproduce

Might be hard to reproduce, since it is a corporate setup of Windows.

Result

Rancher Desktop is stuck at "Waiting for Kubernetes API"

RancherDesktop_stuck_Waiting_for_K8s_API

Expected Behavior

Rancher Desktop is starting without issues.

Additional Information

No response

Rancher Desktop Version

1.6.2

Rancher Desktop K8s Version

1.25.4

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

Windows

Operating System / Build Version

Windows 10 Enterprise (21H2)

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

Check Point VPN. corporate proxy but routed via localhost with px proxy.

@Nino-K
Copy link
Member

Nino-K commented Dec 7, 2022

@PatMis16 thanks for reporting this issue. I can see that you are also using check point VPN along with the px proxy. Are you able to disable (bypass) the proxy temporarily and see if the issue still persists? This will allow us to isolate the issue and make sure it is not the VPN that is causing the issue.

@PatMis16
Copy link
Author

PatMis16 commented Dec 7, 2022

@Nino-K
Hi Nino,
I removed the env vars for the proxy. Rancher Desktop is still stuck at "Waiting for Kubernetes API". I am seeing several E1207 messages in the log:

E1207 09:47:54.016735 785 configmap.go:197] Couldn't get configMap kube-system/chart-content-traefik: object "kube-system"/"chart-content-traefik" not registered E1207 09:47:54.018840 785 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-content podName:214a25a2-8077-477f-aa34-379c3b2b1b99 nodeName:}" failed. No retries permitted until 2022-12-07 09:49:56.0188096 +0000 UTC m=+377.473425901 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "content" (UniqueName: "kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-content") pod "helm-install-traefik-27l9f" (UID: "214a25a2-8077-477f-aa34-379c3b2b1b99") : object "kube-system"/"chart-content-traefik" not registered

Before those lines I see several other Errors and warnings:

E1207 09:43:59.913356 785 configmap.go:197] Couldn't get configMap kube-system/chart-content-traefik: object "kube-system"/"chart-content-traefik" not registered E1207 09:43:59.913399 785 configmap.go:197] Couldn't get configMap kube-system/chart-values-traefik: object "kube-system"/"chart-values-traefik" not registered E1207 09:43:59.913859 785 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-content podName:214a25a2-8077-477f-aa34-379c3b2b1b99 nodeName:}" failed. No retries permitted until 2022-12-07 09:44:15.9138028 +0000 UTC m=+37.368419101 (durationBeforeRetry 16s). Error: MountVolume.SetUp failed for volume "content" (UniqueName: "kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-content") pod "helm-install-traefik-27l9f" (UID: "214a25a2-8077-477f-aa34-379c3b2b1b99") : object "kube-system"/"chart-content-traefik" not registered E1207 09:43:59.914729 785 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-values podName:214a25a2-8077-477f-aa34-379c3b2b1b99 nodeName:}" failed. No retries permitted until 2022-12-07 09:44:15.9146922 +0000 UTC m=+37.369308501 (durationBeforeRetry 16s). Error: MountVolume.SetUp failed for volume "values" (UniqueName: "kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-values") pod "helm-install-traefik-27l9f" (UID: "214a25a2-8077-477f-aa34-379c3b2b1b99") : object "kube-system"/"chart-values-traefik" not registered I1207 09:44:00.140994 785 shared_informer.go:262] Caches are synced for garbage collector I1207 09:44:00.141614 785 garbagecollector.go:163] Garbage collector: all resource monitors have synced. Proceeding to collect garbage I1207 09:44:00.163012 785 shared_informer.go:262] Caches are synced for garbage collector I1207 09:44:01.097682 785 trace.go:205] Trace[2076317647]: "Proxy via http_connect protocol over tcp" address:<some ip>:10250 (07-Dec-2022 09:43:57.992) (total time: 3105ms): Trace[2076317647]: [3.1051586s] [3.1051586s] END I1207 09:44:01.097724 785 trace.go:205] Trace[838491155]: "Proxy via http_connect protocol over tcp" address:<some ip>:10250 (07-Dec-2022 09:43:57.992) (total time: 3105ms): Trace[838491155]: [3.1053515s] [3.1053515s] END I1207 09:44:01.097747 785 trace.go:205] Trace[1503997365]: "Proxy via http_connect protocol over tcp" address:<some ip>:10250 (07-Dec-2022 09:43:57.992) (total time: 3105ms): Trace[1503997365]: [3.1052819s] [3.1052819s] END I1207 09:44:01.097747 785 trace.go:205] Trace[332305505]: "Proxy via http_connect protocol over tcp" address:<some ip>:10250 (07-Dec-2022 09:43:57.992) (total time: 3105ms): Trace[332305505]: [3.1053768s] [3.1053768s] END I1207 09:44:01.097795 785 trace.go:205] Trace[1335487481]: "Proxy via http_connect protocol over tcp" address:<some ip>:10250 (07-Dec-2022 09:43:57.992) (total time: 3105ms): Trace[1335487481]: [3.1053639s] [3.1053639s] END E1207 09:44:01.099742 785 available_controller.go:524] v1beta1.metrics.k8s.io failed with: failing or missing response from https://<some ip>:10250/apis/metrics.k8s.io/v1beta1: Get "https://<some ip>:10250/apis/metrics.k8s.io/v1beta1": proxy error from 127.0.0.1:6443 while dialing <some ip>:10250, code 503: 503 Service Unavailable W1207 09:44:02.108440 785 handler_proxy.go:105] no RequestInfo found in the context E1207 09:44:02.108815 785 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService I1207 09:44:02.109120 785 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. W1207 09:44:02.108514 785 handler_proxy.go:105] no RequestInfo found in the context E1207 09:44:02.109691 785 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]] I1207 09:44:02.111119 785 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue. W1207 09:44:02.964901 785 lease.go:250] Resetting endpoints for master service "kubernetes" to [<some other ip>] time="2022-12-07T09:44:02Z" level=info msg="Stopped tunnel to <some other ip 2>:6443" time="2022-12-07T09:44:02Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp <some other ip 2>:6443: operation was canceled" time="2022-12-07T09:44:02Z" level=error msg="Remotedialer proxy error" error="dial tcp <some other ip 2>:6443: operation was canceled" E1207 09:44:15.920570 785 configmap.go:197] Couldn't get configMap kube-system/chart-content-traefik: object "kube-system"/"chart-content-traefik" not registered E1207 09:44:15.920876 785 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-content podName:214a25a2-8077-477f-aa34-379c3b2b1b99 nodeName:}" failed. No retries permitted until 2022-12-07 09:44:47.9208585 +0000 UTC m=+69.375474801 (durationBeforeRetry 32s). Error: MountVolume.SetUp failed for volume "content" (UniqueName: "kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-content") pod "helm-install-traefik-27l9f" (UID: "214a25a2-8077-477f-aa34-379c3b2b1b99") : object "kube-system"/"chart-content-traefik" not registered E1207 09:44:15.920623 785 configmap.go:197] Couldn't get configMap kube-system/chart-values-traefik: object "kube-system"/"chart-values-traefik" not registered E1207 09:44:15.921387 785 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-values podName:214a25a2-8077-477f-aa34-379c3b2b1b99 nodeName:}" failed. No retries permitted until 2022-12-07 09:44:47.921364 +0000 UTC m=+69.375980301 (durationBeforeRetry 32s). Error: MountVolume.SetUp failed for volume "values" (UniqueName: "kubernetes.io/configmap/214a25a2-8077-477f-aa34-379c3b2b1b99-values") pod "helm-install-traefik-27l9f" (UID: "214a25a2-8077-477f-aa34-379c3b2b1b99") : object "kube-system"/"chart-values-traefik" not registered I1207 09:44:18.068196 785 scope.go:115] "RemoveContainer" containerID="869467e15085a30955b687735d36397805c733e2275a9b35d7170ea364771b56" I1207 09:44:18.068672 785 scope.go:115] "RemoveContainer" containerID="1008c1449a0655e828574ef0eebd607c08bd9e742def4d8e878f2e09e4004728" E1207 09:44:18.069199 785 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"local-path-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=local-path-provisioner pod=local-path-provisioner-79f67d76f8-kd5qd_kube-system(86df8f1f-1185-428d-aeec-c6e81fbbab03)\"" pod="kube-system/local-path-provisioner-79f67d76f8-kd5qd" podUID=86df8f1f-1185-428d-aeec-c6e81fbbab03 I1207 09:44:18.081267 785 scope.go:115] "RemoveContainer" containerID="9d6d202ddbe85eda5f205a1ab200b5d0df8c7439c204a09b39f73f54b93ebf6a" E1207 09:44:18.081901 785 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"metrics-server\" with CrashLoopBackOff: \"back-off 10s restarting failed container=metrics-server pod=metrics-server-5c8978b444-fqjmf_kube-system(0ceaac7d-35c0-460d-873c-5da49ad323ad)\"" pod="kube-system/metrics-server-5c8978b444-fqjmf" podUID=0ceaac7d-35c0-460d-873c-5da49ad323ad I1207 09:44:18.099999 785 scope.go:115] "RemoveContainer" containerID="3e4d49bc061524794b02a76d6da6d2d99e8ed2738d006d4323d98c0289cf46c5" E1207 09:44:18.110082 785 network_policy_controller.go:284] Aborting sync. Failed to run iptables-restore: failed to call iptables-restore: exit status 1 (Warning: Extension limit revision 0 not supported, missing kernel module? iptables-restore: line 96 failed

@Nino-K
Copy link
Member

Nino-K commented Dec 7, 2022

Thanks for providing additional details. I highly suspect this is a VPN issue rather than a proxy. Although, the proxy might not work out of the box either just because windows environment variables are not directly translated into WSL without additional work. But let's focus on one issue at a time :)

There is a known issue with some VPN clients when used in conjunction with WSL and I think Checkpoint VPN is one of them. The core of the issue is how the VPN client allocates the lowest route metric in the network with itself (todo with a split DNS algorithm), which causes an infinite loop for traffic that is coming from WSL network. Meaning that, once the traffic is out of the WSL network it doesn't know how to get back in WSL because it always goes to the VPN client (because of the lowest interface metric) and that causes the infinite loop you can learn more about the issue here: microsoft/WSL#4246

However, we have a short-term workaround for this suggested here: #3091 (comment)

Are you able to give that a try to see if it rectifies the issue for you?

@PatMis16
Copy link
Author

PatMis16 commented Dec 8, 2022

Today I am in the office and I am working without VPN. Now I am getting this error

2022-12-08T07:09:27.871Z: Waited more than 300 secs for kubernetes to fully start up. Giving up.
2022-12-08T07:09:28.278Z: Error priming kuberlr: Error: C:\Users\mischlerpa\AppData\Local\Programs\Rancher Desktop\resources\resources\win32\bin\kubectl.exe exited with code 1
2022-12-08T07:09:28.279Z: Output from kuberlr:
ex.stdout: [
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
],
ex.stderr: [I1208 08:09:27.983422   16268 versioner.go:58] Get "https://<some ip>:6443/version?timeout=5s": dial tcp <some ip>:6443: connectex: An attempt was made to access a socket in a way forbidden by its access permissions.
Unable to connect to the server: dial tcp <some ip>:6443: connectex: An attempt was made to access a socket in a way forbidden by its access permissions.
]
2022-12-08T07:09:28.279Z: Failed to match a kuberlr network access issue.

@Nino-K
Copy link
Member

Nino-K commented Dec 8, 2022

@PatMis16 thanks for the update, just to confirm that's without the VPN but going through px proxy only?

@PatMis16
Copy link
Author

PatMis16 commented Dec 8, 2022

Hi Nino,

That's about right.
It works well on my private notebook.

Best Regards,
Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants