Cluster validation error "master is missing kube-apiserver pod, master is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

MeirP-3 · 2020-10-11T17:46:47Z

1. What kops version are you running? The command kops version, will display
this information.

$ kops version
I1011 20:11:56.850600   31093 featureflag.go:154] FeatureFlag "Spotinst"=true
Version 1.18.1 (git-453d7d96be)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.14.2

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-09-30T19:31:27Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
First I ran
kops replace -f <manifest>
kops update cluster --yes
kops rolling-update cluster --yes --cloudonly
I ran it after migration from kops 1.14.1 (git-8aeefa9a4) to kops 1.18.1 (git-453d7d96be) and several updates in instance groups config files
5. What happened after the commands executed?
Kops prints the following messages every 30 seconds. The pods exist for sure and have the label k8s.io/app: kube-scheduler etc. but kops thinks they are missing.

I1011 20:23:32.608495   32083 instancegroups.go:383] Validating the cluster.
I1011 20:23:36.912149   32083 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": master "ip-**.ec2.internal" is missing kube-apiserver pod, master "ip-**.ec2.internal" is missing kube-controller-manager pod, master "ip-**.ec2.internal" is missing kube-scheduler pod, master "ip-**.ec2.internal" is missing kube-controller-manager pod, master "ip-**.ec2.internal" is missing kube-scheduler pod, master "ip-**.ec2.internal" is missing kube-apiserver pod, master "ip-**.ec2.internal" is missing kube-apiserver pod, master "ip-**.ec2.internal" is missing kube-controller-manager pod, master "ip-**.ec2.internal" is missing kube-scheduler pod.

6. What did you expect to happen?
This validation error should not happen since these pods exist.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

See here

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
I am using spotinst with the following NODEUP_URL: https://nuvo-temporary-spot-nodup-bucket.s3.amazonaws.com/nodeup

The text was updated successfully, but these errors were encountered:

olemarkus · 2020-10-11T18:21:15Z

Is this something you are consistently experiencing across clusters?

The test for pods existing is fairly trivial so I am very curious how it can fail in this manner.

Are you able to provide a cluster manifest where you can reproduce this error with a plain AWS-based cluster?

MeirP-3 · 2020-10-12T10:58:56Z

I have found the cause in the source code:
The function collectPodFailures looks at spec.priorityClassName which my pods don't have.

I can't change spec.priorityClassName nither spec.priority.
When I try to change them I get the following error:

The Pod "kube-apiserver-ip-**.ec2.internal" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

olemarkus · 2020-10-12T11:12:52Z

can you check /etc/kubernetes/manifests/kube-apiserver.manifest and see if the priority class is set here?
If it is set there it may be an admission controller or something that mutates the manifest.

MeirP-3 · 2020-10-12T11:32:58Z

The priority class is not set at /etc/kubernetes/manifests/kube-apiserver.manifest.
See the manifest

olemarkus · 2020-10-12T12:05:30Z

This is done by nodeup, so it may be that your custom nodeup is somewhat outdated.

johngmyers · 2020-10-14T05:34:20Z

It appears the PriorityClassName is set on static pods starting with Kops 1.15.0. So this will happen when upgrading from Kops 1.14 or earlier directly to Kops 1.18 or later.

johngmyers mentioned this issue Oct 14, 2020

Don't require PriorityClassName to pass missing-static-pod checks #10049

Merged

k8s-ci-robot closed this as completed in #10049 Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster validation error "master is missing kube-apiserver pod, master is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

Cluster validation error "master is missing kube-apiserver pod, master is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

MeirP-3 commented Oct 11, 2020

olemarkus commented Oct 11, 2020

MeirP-3 commented Oct 12, 2020

olemarkus commented Oct 12, 2020

MeirP-3 commented Oct 12, 2020

olemarkus commented Oct 12, 2020

johngmyers commented Oct 14, 2020

Cluster validation error "master ** is missing kube-apiserver pod, master ** is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

Cluster validation error "master ** is missing kube-apiserver pod, master ** is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

Comments

MeirP-3 commented Oct 11, 2020

olemarkus commented Oct 11, 2020

MeirP-3 commented Oct 12, 2020

olemarkus commented Oct 12, 2020

MeirP-3 commented Oct 12, 2020

olemarkus commented Oct 12, 2020

johngmyers commented Oct 14, 2020

Cluster validation error "master is missing kube-apiserver pod, master is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

Cluster validation error "master is missing kube-apiserver pod, master is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041