Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster validation error "master ** is missing kube-apiserver pod, master ** is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist #10041

Closed
MeirP-3 opened this issue Oct 11, 2020 · 6 comments · Fixed by #10049

Comments

@MeirP-3
Copy link

MeirP-3 commented Oct 11, 2020

1. What kops version are you running? The command kops version, will display
this information.

$ kops version
I1011 20:11:56.850600   31093 featureflag.go:154] FeatureFlag "Spotinst"=true
Version 1.18.1 (git-453d7d96be)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.14.2

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-09-30T19:31:27Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
First I ran
kops replace -f <manifest>
kops update cluster --yes
kops rolling-update cluster --yes --cloudonly
I ran it after migration from kops 1.14.1 (git-8aeefa9a4) to kops 1.18.1 (git-453d7d96be) and several updates in instance groups config files
5. What happened after the commands executed?
Kops prints the following messages every 30 seconds. The pods exist for sure and have the label k8s.io/app: kube-scheduler etc. but kops thinks they are missing.

I1011 20:23:32.608495   32083 instancegroups.go:383] Validating the cluster.
I1011 20:23:36.912149   32083 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": master "ip-**.ec2.internal" is missing kube-apiserver pod, master "ip-**.ec2.internal" is missing kube-controller-manager pod, master "ip-**.ec2.internal" is missing kube-scheduler pod, master "ip-**.ec2.internal" is missing kube-controller-manager pod, master "ip-**.ec2.internal" is missing kube-scheduler pod, master "ip-**.ec2.internal" is missing kube-apiserver pod, master "ip-**.ec2.internal" is missing kube-apiserver pod, master "ip-**.ec2.internal" is missing kube-controller-manager pod, master "ip-**.ec2.internal" is missing kube-scheduler pod.

6. What did you expect to happen?
This validation error should not happen since these pods exist.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
I am using spotinst with the following NODEUP_URL: https://nuvo-temporary-spot-nodup-bucket.s3.amazonaws.com/nodeup

@MeirP-3 MeirP-3 changed the title Cluster validation failes master is missing kube-apiserver, kube-controller-manager, kube-scheduler pods but they exist Cluster validation error "master ** is missing kube-apiserver pod, master ** is missing kube-controller-manager pod, master ** is missing kube-scheduler pod" but they exist Oct 11, 2020
@olemarkus
Copy link
Member

Is this something you are consistently experiencing across clusters?

The test for pods existing is fairly trivial so I am very curious how it can fail in this manner.

Are you able to provide a cluster manifest where you can reproduce this error with a plain AWS-based cluster?

@MeirP-3
Copy link
Author

MeirP-3 commented Oct 12, 2020

I have found the cause in the source code:
The function collectPodFailures looks at spec.priorityClassName which my pods don't have.

I can't change spec.priorityClassName nither spec.priority.
When I try to change them I get the following error:

The Pod "kube-apiserver-ip-**.ec2.internal" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

@olemarkus
Copy link
Member

can you check /etc/kubernetes/manifests/kube-apiserver.manifest and see if the priority class is set here?
If it is set there it may be an admission controller or something that mutates the manifest.

@MeirP-3
Copy link
Author

MeirP-3 commented Oct 12, 2020

The priority class is not set at /etc/kubernetes/manifests/kube-apiserver.manifest.
See the manifest

@olemarkus
Copy link
Member

This is done by nodeup, so it may be that your custom nodeup is somewhat outdated.

@johngmyers
Copy link
Member

It appears the PriorityClassName is set on static pods starting with Kops 1.15.0. So this will happen when upgrading from Kops 1.14 or earlier directly to Kops 1.18 or later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants