Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[-]poststarthook/rbac/bootstrap-roles failed: reason withheld #86715

Closed
zhangguanzhang opened this issue Dec 30, 2019 · 28 comments
Closed

[-]poststarthook/rbac/bootstrap-roles failed: reason withheld #86715

zhangguanzhang opened this issue Dec 30, 2019 · 28 comments
Labels
kind/support Categorizes issue or PR as a support question. sig/auth Categorizes an issue or PR as relevant to SIG Auth. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@zhangguanzhang
Copy link

zhangguanzhang commented Dec 30, 2019

What happened:
the kube-apiserver logs

I1230 11:42:36.625486   51567 healthz.go:177] healthz check poststarthook/crd-informer-synced failed: not finished
I1230 11:42:36.644253   51567 healthz.go:177] healthz check poststarthook/rbac/bootstrap-roles failed: not finished
I1230 11:42:36.644262   51567 healthz.go:177] healthz check poststarthook/scheduling/bootstrap-system-priority-classes failed: not finished
I1230 11:42:36.644280   51567 healthz.go:177] healthz check poststarthook/ca-registration failed: not finished
I1230 11:42:36.644296   51567 healthz.go:191] [+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[-]poststarthook/crd-informer-synced failed: reason withheld
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld
[-]poststarthook/ca-registration failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
I1230 11:42:36.633295  134293 healthz.go:193] [+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/ca-registration ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed

but this is ok

$ kubectl get --raw /healthz/poststarthook/rbac/bootstrap-roles
ok

the code
https://github.com/kubernetes/kubernetes/blob/v1.16.4/staging/src/k8s.io/apiserver/pkg/server/healthz/healthz.go#L162-L206

What you expected to happen:
healthz check passed
How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    v1.16.4
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@zhangguanzhang zhangguanzhang added the kind/bug Categorizes issue or PR as related to a bug. label Dec 30, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 30, 2019
@zhangguanzhang
Copy link
Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 30, 2019
@zhangguanzhang
Copy link
Author

kubeadm config

    apiServer:
      certSANs:
      - 10.96.0.1
      - 127.0.0.1
      - localhost
      - apiserver.k8s.local
      - 172.19.0.2
      - 172.19.0.3
      - 172.19.0.4
      - apiserver01.k8s.local
      - apiserver02.k8s.local
      - apiserver03.k8s.local
      - master
      - kubernetes
      - kubernetes.default
      - kubernetes.default.svc
      - kubernetes.default.svc.cluster.local
      extraArgs:
        authorization-mode: Node,RBAC
        enable-admission-plugins: NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeClaimResize,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority,PodPreset
        runtime-config: api/all,settings.k8s.io/v1alpha1=true
        storage-backend: etcd3
        v: 2
      extraVolumes:
      - hostPath: /etc/localtime
        mountPath: /etc/localtime
        name: localtime
        readOnly: true
      timeoutForControlPlane: 4m0s

@liggitt liggitt added triage/needs-information Indicates an issue needs more information in order to work on it. kind/support Categorizes issue or PR as a support question. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Dec 30, 2019
@liggitt
Copy link
Member

liggitt commented Dec 30, 2019

It is normal for those checks to fail until they complete their startup operation. After the individual healthz get returns ok, doesn't the overall /healthz return ok as well?

@tedyu
Copy link
Contributor

tedyu commented Dec 30, 2019

If you want to see the detailed error, you can enable the following log:

                                klog.V(4).Infof("healthz check %v failed: %v", check.Name(), err)
                                fmt.Fprintf(&verboseOut, "[-]%v failed: reason withheld\n", check.Name())

@zhangguanzhang
Copy link
Author

zhangguanzhang commented Dec 31, 2019

@liggitt @tedyu I keep tracking the logs in v4 level, but there is no healthz check pass,[-]poststarthook/rbac/bootstrap-roles failed: reason withheld always failed,the reason is not finished

@zhangguanzhang zhangguanzhang changed the title [-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld [-]poststarthook/rbac/bootstrap-roles failed: reason withheld Dec 31, 2019
@ialidzhikov
Copy link
Contributor

I observe the same behaviour with v1.17.2.
Logs of apiserver:

[-]poststarthook/rbac/bootstrap-roles failed: reason withheld

I0203 17:41:42.064129       1 healthz.go:177] healthz check poststarthook/rbac/bootstrap-roles failed: not finished

But

$ kubectl get --raw /healthz/poststarthook/rbac/bootstrap-roles
ok

@ialidzhikov
Copy link
Contributor

I guess the health check is ok.
For the issue was that the kube-apiserver was on version v1.16.4, but the kube-controller-manager was on v1.17.2. The kube-controller-manager was not able to acquire leader election because the kube-apiserver v1.16.x does no apply the required rbac for it. The fix was also to update kueb-apiserver to v1.17.

@akokshar
Copy link

akokshar commented Feb 12, 2020

Have just installed fresh 1.17.2 and see the same issue:

Feb 12 12:45:06 kubernetes kube-apiserver[3171]: [-]poststarthook/rbac/bootstrap-roles failed: reason withheld
...
[root@kubernetes ~]# kube-apiserver --version
Kubernetes v1.17.2
[root@kubernetes ~]# kube-controller-manager --version
Kubernetes v1.17.2

And this is the only check which is failing.

@wcollin
Copy link

wcollin commented Mar 11, 2020

k8s v1.16.7 got the same error

@devcui
Copy link

devcui commented Mar 23, 2020

图片
图片

i got the same error

Health check failed because other nodes are not configured. Please build a complete cluster and check again

@devcui
Copy link

devcui commented Mar 23, 2020

and the issues can be closed

@zhangguanzhang
Copy link
Author

but the log doesn't print succeed

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2020
@zhangguanzhang
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2020
@prithviramesh
Copy link

We're observing the same behavior with Kubernetes 1.16 and Kubernetes 1.14

I0710 15:26:01.457701       1 healthz.go:191] [+]ping ok
[+]log ok
[+]etcd ok
[+]kms-provider-0 ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/ca-registration ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed

@prithviramesh
Copy link

is there any idea on what is causing this?

@hakuna-matatah
Copy link
Contributor

If healthz is succeeding from the box (Master node) -

sh-4.2$ kubectl get --raw /healthz/poststarthook/rbac/bootstrap-roles
ok

sh-4.2$ kubectl get --raw /healthz
ok
sh-4.2$ 

but you encounter /healthz failures in apiserver logs like this

I0723 08:47:16.694185       1 healthz.go:191] [+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/ca-registration ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok

one possibility is that your cluster might have modified CRB ClusterRoleBinding/system:public-info-viewer to not allow system:unauthenticated calls to Apiserver.

Please check that if thats the cause and if so, modify the CRB to add this

- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:unauthenticated

@zhangguanzhang
Copy link
Author

If healthz is succeeding from the box (Master node) -

sh-4.2$ kubectl get --raw /healthz/poststarthook/rbac/bootstrap-roles
ok

sh-4.2$ kubectl get --raw /healthz
ok
sh-4.2$ 

but you encounter /healthz failures in apiserver logs like this

I0723 08:47:16.694185       1 healthz.go:191] [+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/ca-registration ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok

one possibility is that your cluster might have modified CRB ClusterRoleBinding/system:public-info-viewer to not allow system:unauthenticated calls to Apiserver.

Please check that if thats the cause and if so, modify the CRB to add this

- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:unauthenticated

I check the crb is ok, it's not been modified

[root@k8s-node1 kube-apiserver]# grep -5 -m1 'bootstrap-roles failed: reason'  kube-apiserver.INFO
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[-]poststarthook/crd-informer-synced failed: reason withheld
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld
[-]poststarthook/ca-registration failed: reason withheld
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[root@k8s-node1 kube-apiserver]# kubectl get clusterrolebinding system:public-info-viewer -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2020-04-15T14:20:12Z"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:public-info-viewer
  resourceVersion: "97"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system%3Apublic-info-viewer
  uid: f78c1a71-ebd0-47da-b5a0-f75cb3795232
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:public-info-viewer
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:unauthenticated   
[root@k8s-node1 kube-apiserver]# kubectl version -o json
{
  "clientVersion": {
    "major": "1",
    "minor": "16",
    "gitVersion": "v1.16.7",
    "gitCommit": "be3d344ed06bff7a4fc60656200a93c74f31f9a4",
    "gitTreeState": "clean",
    "buildDate": "2020-02-11T19:34:02Z",
    "goVersion": "go1.13.6",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "serverVersion": {
    "major": "1",
    "minor": "16",
    "gitVersion": "v1.16.7",
    "gitCommit": "be3d344ed06bff7a4fc60656200a93c74f31f9a4",
    "gitTreeState": "clean",
    "buildDate": "2020-02-11T19:24:46Z",
    "goVersion": "go1.13.6",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 22, 2020
@zhangguanzhang
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 22, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2021
@george-angel
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 19, 2021
@Sokwva
Copy link

Sokwva commented Feb 21, 2021

same error on v1.20.4

@Dentrax
Copy link

Dentrax commented Mar 29, 2021

Same error on v1.20.2

$ minikube start --alsologtostderr (VirtualBox Version 6.1.16 r140961 (Qt5.6.3))
macOS 11.0.1

W0329 14:57:43.082477   36365 api_server.go:99] status: https://192.168.99.101:8443/healthz returned error 500:
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check failed
| I0329 14:57:43.565964   36365 api_server.go:221] Checking apiserver healthz at https://192.168.99.101:8443/healthz ...
I0329 14:57:43.576080   36365 api_server.go:241] https://192.168.99.101:8443/healthz returned 200:
ok

@liggitt
Copy link
Member

liggitt commented Apr 16, 2021

Without more information, this isn't actionable. It is possible for the startup hook to fail if it takes too long to create the bootstrap roles.

If this is encountered, please provide the output of kubectl get --raw /healthz/poststarthook/rbac/bootstrap-roles as well to get published details about the cause of the failure, and the content of the API server log to get internal details about the failure (the hook logs operations prefixed with storage_rbac.go

/close

@k8s-ci-robot
Copy link
Contributor

@liggitt: Closing this issue.

In response to this:

Without more information, this isn't actionable. It is possible for the startup hook to fail if it takes too long to create the bootstrap roles.

If this is encountered, please provide the output of kubectl get --raw /healthz/poststarthook/rbac/bootstrap-roles as well to get published details about the cause of the failure, and the content of the API server log to get internal details about the failure (the hook logs operations prefixed with storage_rbac.go

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@1998729
Copy link

1998729 commented Nov 12, 2021

I seem to have solved this problem, the memory and cpu limited by kubelet are too small

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. sig/auth Categorizes an issue or PR as relevant to SIG Auth. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Archived in project
Development

No branches or pull requests