Migrating to 1.8 with RBAC is incompatiable #4163

naveensrinivasan · 2017-12-28T15:21:49Z

Thanks for submitting an issue! Please fill in as much of the template below as
you can.

------------- BUG REPORT TEMPLATE --------------------

What kops version are you running? The command kops version, will display
this information.
Version 1.8.0 (git-4876009bd)
What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
v1.7.7
What cloud provider are you using?
aws
What commands did you run? What is the simplest way to reproduce this issue?
kops update cluster
What happened after the commands executed?
What did you expect to happen?
Upgrade the cluster to v1.8.6
Please provide your cluster manifest. Execute
kops get --name my.example.com -oyaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Anything else do we need to know?

We are trying to upgrade the cluster from v1.7.7 to v1.8.6 with RBAC turned on.
We used the kops master to upgrade kops version Version 1.8.0 (git-4876009bd)

I1227 16:17:34.682684       7 rbac.go:116] RBAC DENY: user "kubelet" groups ["system:nodes" "system:authenticated"] cannot "create" resource "pods" in namespace "kube-system"
I1227 16:17:34.682827       7 wrap.go:42] POST /api/v1/namespaces/kube-system/pods: (352.225µs) 403 [[kubelet/v1.8.6 (linux/amd64) kubernetes/6260bb0] 127.0.0.1:32806]
I1227 16:17:34.683112       7 rbac.go:116] RBAC DENY: user "kubelet" groups ["system:nodes" "system:authenticated"] cannot "create" resource "events" in namespace "default"
I1227 16:17:34.683175       7 wrap.go:42] POST /api/v1/namespaces/default/events: (204.479µs) 403 [[kubelet/v1.8.6 (linux/amd64) kubernetes/6260bb0] 127.0.0.1:32806]
I1227 16:17:34.684278       7 rbac.go:116] RBAC DENY: user "kubelet" groups ["system:nodes" "system:authenticated"] cannot "create" resource "events" in namespace "default"
I1227 16:17:34.684381       7 wrap.go:42] POST /api/v1/namespaces/default/events: (272.221µs) 403 [[kubelet/v1.8.6 (linux/amd64) kubernetes/6260bb0] 127.0.0.1:32806]

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-12-26T20:42:03Z
  name: k8s.playground.REDACTED.io
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://k8s.playground.REDACTED.io/k8s.playground.REDACTED.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1c
      name: c
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationRbacSuperUser: admin
    storageBackend: etcd3
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.8.6
  masterInternalName: api.internal.k8s.playground.REDACTED.io
  masterPublicName: api.k8s.playground.REDACTED.io
  networkCIDR: 172.20.0.0/16
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  - cidr: 172.20.64.0/19
    name: us-east-1b
    type: Public
    zone: us-east-1b
  - cidr: 172.20.96.0/19
    name: us-east-1c
    type: Public
    zone: us-east-1c
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

We did run this yaml before migrating and it still didn't help.

kubectl get  clusterrolebinding system:node -o yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: 2017-12-26T20:53:27Z
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:node
  resourceVersion: "850"
  selfLink: /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings/system%3Anode
  uid: d24fbe68-ea7e-11e7-a9e1-0201c744720e
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:node
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

------------- FEATURE REQUEST TEMPLATE --------------------

Describe IN DETAIL the feature/behavior/change you would like to see.
Feel free to provide a design supporting your feature request.

The text was updated successfully, but these errors were encountered:

liggitt · 2017-12-28T15:33:23Z

Do the authorization errors persist in the log after the api server has completed startup and /healthz returns a 200? Some denials during server startup are normal as the authorization cache fills

naveensrinivasan · 2017-12-28T16:03:19Z

It continues and the cluster is inoperable.

liggitt · 2017-12-28T16:32:13Z

After upgrading, what does this show?

kubectl get clusterrolebinding system:node -o yaml
kubectl get clusterrole system:node -o yaml

liggitt · 2017-12-28T16:35:44Z

I also see this: https://github.com/kubernetes/kops/blob/1ff42edfac77df99ffa617113e51dad209ae0ce8/upup/models/cloudup/resources/addons/rbac.addons.k8s.io/k8s-1.8.yaml

I'm not familiar with what kops does on upgrade with the add on bindings

naveensrinivasan · 2017-12-28T17:02:34Z

@liggitt I manually ran the above yaml and it didn't help.

The api-server is unavailable after the upgrade so any of the kubectl commands fail.

liggitt · 2017-12-28T17:10:32Z

kubelet permissions should not affect api server availability. I'm not sure how to debug further if the api server is unreachable. Do you have more apiserver logs that might be illuminating? @chrislovecnm any ideas of what else might be at play here?

KashifSaadat · 2017-12-29T14:58:15Z

@naveensrinivasan was RBAC already configured and working when the Cluster was on v1.7.7, or did you change it in the spec as part of the upgrade?

@liggitt not sure about the addons behaviour, but if performing an upgrade from v1.7 then the necessary RoleBinding will already exist, so that shouldn't be the issue I suspect.

naveensrinivasan · 2017-12-29T15:23:06Z

@KashifSaadat RBAC was already configured and working when the cluster was v1.7.7

naveensrinivasan · 2017-12-29T15:29:25Z

Here are the log files. https://gist.github.com/naveensrinivasan/80eb10aa3bd2259139b48a6a78100357

I don't know exactly when I grabbed them. This from the master and I grabbed all the logs

api
controller
proxy
scheduler

mqasimsarfraz · 2018-01-02T10:57:15Z

I am hitting the same issue after the upgrade following different installation method and I am sure system:nodes group have system:node role. Interestingly it isn't just system:nodes I see other groups e.g system:authenticated effected as well.

RBAC DENY: user "system:kube-proxy" groups ["system:authenticated"] cannot "list" resource "services" cluster-wide

Following this API server never comes up and kube control plane is down.

liggitt · 2018-01-02T19:42:17Z

@naveensrinivasan what does apiserver /healthz show while the API server is crashlooping in that state? do you have the full apiserver manifest used, including all flags?

seeing this, which makes me suspect issues writing to etcd:

I1226 17:20:58.368013       8 trace.go:76] Trace[2144299595]: "Create /api/v1/namespaces" (started: 2017-12-26 17:20:53.848730671 +0000 UTC) (total time: 4.5192501s):
Trace[2144299595]: [4.284563321s] [4.284499666s] About to store object in database
Trace[2144299595]: [4.5192501s] [234.686779ms] END
I1226 17:20:58.368361       8 wrap.go:42] POST /api/v1/namespaces: (4.519639312s) 500

liggitt · 2018-01-02T19:45:30Z

@mqasimsarfraz what is the output of a superuser in the system:masters group calling /healthz on the apiserver? RBAC denials could prevent other components from talking to the API server, but would not keep the API server from coming up. I suspect issues reading from and/or writing to etcd

mqasimsarfraz · 2018-01-02T19:56:35Z

@liggitt Where can I find that output? also following is what I can find related to /healthz in API server logs:

 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:108 +0x1ca
logging error output: "[+]ping ok\n[+]etcd ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[-]poststarthook/bootstrap-controller failed: reason withheld\n[-]poststarthook/rbac/bootstrap-roles failed: reason withheld\n[-]poststarthook/ca-registration failed: reason withheld\n[+]poststarthook/start-kube-apiserver-informers ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/apiservice-openapi-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[-]autoregister-completion failed: reason withheld\nhealthz check failed\n"
 [[kube-probe/1.8] 127.0.0.1:55014]

liggitt · 2018-01-02T19:58:53Z

formatted better, that shows:

[+]ping ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[-]poststarthook/bootstrap-controller failed: reason withheld
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/ca-registration failed: reason withheld
[+]poststarthook/start-kube-apiserver-informers ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[-]autoregister-completion failed: reason withheld
healthz check failed

the details for the failed hooks are available at these URLs:

/healthz/poststarthook/bootstrap-controller
/healthz/poststarthook/rbac/bootstrap-roles
/healthz/poststarthook/ca-registration
/healthz/autoregister-completion

mqasimsarfraz · 2018-01-02T20:11:51Z

Can't find anything useful from URLs:

[qasim.sarfraz@kube-master-03 ~]$ curl -i 127.0.0.1:8080/healthz/poststarthook/bootstrap-controller
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Tue, 02 Jan 2018 20:08:29 GMT
Content-Length: 36

internal server error: not finished
[qasim.sarfraz@kube-master-03 ~]$ curl -i 127.0.0.1:8080/healthz/poststarthook/rbac/bootstrap-roles
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Tue, 02 Jan 2018 20:08:42 GMT
Content-Length: 36

internal server error: not finished
[qasim.sarfraz@kube-master-03 ~]$ curl -i 127.0.0.1:8080/healthz/poststarthook/ca-registration
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Tue, 02 Jan 2018 20:08:51 GMT
Content-Length: 36

internal server error: not finished
[qasim.sarfraz@kube-master-03 ~]$ curl -i 127.0.0.1:8080/healthz/autoregister-completion
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Tue, 02 Jan 2018 20:09:02 GMT
Content-Length: 495

internal server error: missing APIService: [v1. v1.authentication.k8s.io v1.authorization.k8s.io v1.autoscaling v1.batch v1.networking.k8s.io v1.rbac.authorization.k8s.io v1.storage.k8s.io v1alpha1.admissionregistration.k8s.io v1beta1.apiextensions.k8s.io v1beta1.apps v1beta1.authentication.k8s.io v1beta1.authorization.k8s.io v1beta1.batch v1beta1.certificates.k8s.io v1beta1.extensions v1beta1.policy v1beta1.rbac.authorization.k8s.io v1beta1.storage.k8s.io v1beta2.apps v2beta1.autoscaling]

liggitt · 2018-01-02T20:40:28Z

All of those point to etcd write errors/hangs. Did etcd setup change during the upgrade? What are the flags passed to the apiserver?

mqasimsarfraz · 2018-01-02T21:38:47Z

Ahan interesting, No I haven't changed it but let me try to check etcd dumps. Also following are flags to apiserver:

    - --advertise-address=10.1.165.137
    - --etcd-servers=https://10.1.165.214:2379,https://10.1.165.66:2379,https://10.1.165.240:2379
    - --etcd-quorum-read=true
    - --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem
    - --etcd-certfile=/etc/ssl/etcd/ssl/node-kube-master-03.example.com.pem
    - --etcd-keyfile=/etc/ssl/etcd/ssl/node-kube-master-03.example.com-key.pem
    - --insecure-bind-address=0.0.0.0
    - --apiserver-count=3
    - --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,GenericAdmissionWebhook,ResourceQuota
    - --service-cluster-ip-range=10.234.0.0/18
    - --service-node-port-range=30000-32767
    - --client-ca-file=/etc/kubernetes/ssl/ca.pem
    - --profiling=false
    - --repair-malformed-updates=false
    - --kubelet-client-certificate=/etc/kubernetes/ssl/node-kube-master-03.example.com.pem
    - --kubelet-client-key=/etc/kubernetes/ssl/node-kube-master-03.example.com-key.pem
    - --service-account-lookup=true
    - --tls-cert-file=/etc/kubernetes/ssl/apiserver.pem
    - --tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --proxy-client-cert-file=/etc/kubernetes/ssl/apiserver.pem
    - --proxy-client-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --secure-port=6443
    - --insecure-port=8080
    - --storage-backend=etcd3
    - --runtime-config=admissionregistration.k8s.io/v1alpha1
    - --v=2
    - --allow-privileged=true
    - --anonymous-auth=False
    - --authorization-mode=RBAC
    - --feature-gates=Initializers=true

chrislovecnm · 2018-01-02T21:57:04Z

I noticed that etcd is not setup for etcd 3 btw. Check but I think you are still running efcd2

chrislovecnm · 2018-01-02T21:58:01Z

You have

storageBackend: etcd3

But you are not setting the etcd version in the manifest as required

mqasimsarfraz · 2018-01-02T22:30:50Z

@liggitt thanks for the pointer for me it was etcd. The etcd cluster was misbehaving for some reason and everything is back to normal once I fixed it. I wonder why ectd was marked ok in the health check or there wasn't any logging for etcd failure.

[+]etcd ok

Thanks again!

naveensrinivasan · 2018-01-02T22:32:45Z

I have it running as etcd3

kubeAPIServer:
    authorizationRbacSuperUser: admin
    storageBackend: etcd3

liggitt · 2018-01-02T22:41:53Z

@naveensrinivasan and is your etcd cluster an etcd3 cluster? What version is it running?

naveensrinivasan · 2018-01-02T22:43:52Z

@liggitt It was running etcd and part of the upgrade I had to change it to etcd3.

liggitt · 2018-01-02T22:46:45Z

Did you migrate the etcd data from the etcd2 to etcd3 stores? You cannot simply upgrade the etcd binary and switch to etcd3 mode. If you didn't do a migration, you should continue to run kubernetes in etcd2 mode as long as you have v2 data (even against an etcd3 server)

naveensrinivasan · 2018-01-02T22:51:40Z

Nope, I didn't migrate. I was trying to use etcd2 in kops for 1.8 and I was running into issues which made me change to etcd3.

@chrislovecnm Would kops upgrade to v1.8 without moving to etcd3.

liggitt · 2018-01-02T22:52:48Z

You can continue to use etcd2 (or etcd3 in etcd2 mode) against 1.8 and 1.9

naveensrinivasan · 2018-01-02T22:53:53Z

how do you use etcd2 in etcd3?

liggitt · 2018-01-02T23:01:12Z

Run etcd3 binaries and start the kube apiserver with --storage-backend=etcd2

Kubernetes will continue to use the v2 API (which etcd3 still supports) and will have access to your old c2 data via it

naveensrinivasan · 2018-01-02T23:03:19Z

Thanks, I don't know if kops is doing this or is it possible to do this in kops?

chrislovecnm · 2018-01-02T23:19:03Z

Yes, remove the etcd3 line in you manifest. Or edit your cluster.

naveensrinivasan · 2018-01-03T19:04:17Z

I think the issue was I was using the kops from the master branch or another version which was causing the whole migration messed up. I pulled the release version of kops 1.8 and it is working. Thanks

naveensrinivasan mentioned this issue Dec 28, 2017

kops rolling-update authorization to RBAC, kubelet groups RBAC DENY #3750

Closed

naveensrinivasan closed this as completed Jan 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating to 1.8 with RBAC is incompatiable #4163

Migrating to 1.8 with RBAC is incompatiable #4163

naveensrinivasan commented Dec 28, 2017

liggitt commented Dec 28, 2017

naveensrinivasan commented Dec 28, 2017

liggitt commented Dec 28, 2017 •

edited

Loading

liggitt commented Dec 28, 2017

naveensrinivasan commented Dec 28, 2017

liggitt commented Dec 28, 2017

KashifSaadat commented Dec 29, 2017 •

edited

Loading

naveensrinivasan commented Dec 29, 2017

naveensrinivasan commented Dec 29, 2017

mqasimsarfraz commented Jan 2, 2018 •

edited

Loading

liggitt commented Jan 2, 2018 •

edited

Loading

liggitt commented Jan 2, 2018

mqasimsarfraz commented Jan 2, 2018

liggitt commented Jan 2, 2018 •

edited

Loading

mqasimsarfraz commented Jan 2, 2018

liggitt commented Jan 2, 2018

mqasimsarfraz commented Jan 2, 2018

chrislovecnm commented Jan 2, 2018

chrislovecnm commented Jan 2, 2018

mqasimsarfraz commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018 •

edited

Loading

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

chrislovecnm commented Jan 2, 2018

naveensrinivasan commented Jan 3, 2018

Migrating to 1.8 with RBAC is incompatiable #4163

Migrating to 1.8 with RBAC is incompatiable #4163

Comments

naveensrinivasan commented Dec 28, 2017

liggitt commented Dec 28, 2017

naveensrinivasan commented Dec 28, 2017

liggitt commented Dec 28, 2017 • edited Loading

liggitt commented Dec 28, 2017

naveensrinivasan commented Dec 28, 2017

liggitt commented Dec 28, 2017

KashifSaadat commented Dec 29, 2017 • edited Loading

naveensrinivasan commented Dec 29, 2017

naveensrinivasan commented Dec 29, 2017

mqasimsarfraz commented Jan 2, 2018 • edited Loading

liggitt commented Jan 2, 2018 • edited Loading

liggitt commented Jan 2, 2018

mqasimsarfraz commented Jan 2, 2018

liggitt commented Jan 2, 2018 • edited Loading

mqasimsarfraz commented Jan 2, 2018

liggitt commented Jan 2, 2018

mqasimsarfraz commented Jan 2, 2018

chrislovecnm commented Jan 2, 2018

chrislovecnm commented Jan 2, 2018

mqasimsarfraz commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018 • edited Loading

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

liggitt commented Jan 2, 2018

naveensrinivasan commented Jan 2, 2018

chrislovecnm commented Jan 2, 2018

naveensrinivasan commented Jan 3, 2018

liggitt commented Dec 28, 2017 •

edited

Loading

KashifSaadat commented Dec 29, 2017 •

edited

Loading

mqasimsarfraz commented Jan 2, 2018 •

edited

Loading

liggitt commented Jan 2, 2018 •

edited

Loading

liggitt commented Jan 2, 2018 •

edited

Loading

naveensrinivasan commented Jan 2, 2018 •

edited

Loading