Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading Control Plane fails due to invalid memory address #2633

Closed
Xenwar opened this issue Mar 11, 2020 · 19 comments · Fixed by #2641
Closed

Upgrading Control Plane fails due to invalid memory address #2633

Xenwar opened this issue Mar 11, 2020 · 19 comments · Fixed by #2641
Assignees
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@Xenwar
Copy link

Xenwar commented Mar 11, 2020

What steps did you take and what happened:
During upgrade, due to some unknown error, the pod crashes.
Working on getting the cause.

Note: The main focus of this bug report is on fixing the nil pointer issue, much like #2613.

What did you expect to happen:
A graceful failure

Environment:

  • Cluster-api version: v1alpha3
  • Minikube/KIND version:
  • Kubernetes version: v1.17.3
  • Ubuntu 18.04.4

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 11, 2020
@Xenwar
Copy link
Author

Xenwar commented Mar 11, 2020

capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-56d8897c7d-cnn9p   1/2     CrashLoopBackOff   7          56m

@Xenwar
Copy link
Author

Xenwar commented Mar 11, 2020

kubectl logs capi-kubeadm-control-plane-controller-manager-56d8897c7d-cnn9p -n capi-kubeadm-control-plane-system -c manager
I0311 09:01:16.374402       1 listener.go:40] controller-runtime/metrics "msg"="metrics server is starting to listen"  "addr"="127.0.0.1:8080"
I0311 09:01:16.376447       1 main.go:119] setup "msg"="starting manager"  
I0311 09:01:16.378539       1 leaderelection.go:242] attempting to acquire leader lease  capi-kubeadm-control-plane-system/kubeadm-control-plane-manager-leader-election-capi...
I0311 09:01:16.382102       1 internal.go:356] controller-runtime/manager "msg"="starting metrics server"  "path"="/metrics"
I0311 09:01:33.837344       1 leaderelection.go:252] successfully acquired lease capi-kubeadm-control-plane-system/kubeadm-control-plane-manager-leader-election-capi
I0311 09:01:33.847483       1 controller.go:164] controller-runtime/controller "msg"="Starting EventSource"  "controller"="kubeadmcontrolplane" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"version":"","infrastructureTemplate":{},"kubeadmConfigSpec":{}},"status":{"initialized":false,"ready":false}}}
I0311 09:01:33.959149       1 controller.go:164] controller-runtime/controller "msg"="Starting EventSource"  "controller"="kubeadmcontrolplane" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"clusterName":"","bootstrap":{},"infrastructureRef":{}},"status":{"bootstrapReady":false,"infrastructureReady":false}}}
I0311 09:01:34.073241       1 controller.go:164] controller-runtime/controller "msg"="Starting EventSource"  "controller"="kubeadmcontrolplane" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"controlPlaneEndpoint":{"host":"","port":0}},"status":{"infrastructureReady":false,"controlPlaneInitialized":false}}}
I0311 09:01:34.228410       1 controller.go:171] controller-runtime/controller "msg"="Starting Controller"  "controller"="kubeadmcontrolplane"
I0311 09:01:34.228487       1 controller.go:190] controller-runtime/controller "msg"="Starting workers"  "controller"="kubeadmcontrolplane" "worker count"=10
I0311 09:01:34.359547       1 kubeadm_control_plane_controller.go:252] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="test1" "kubeadmControlPlane"="test1-controlplane" "namespace"="metal3" 
E0311 09:01:35.049926       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 278 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x16a8280, 0x26f9f60)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
panic(0x16a8280, 0x26f9f60)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).upgradeControlPlane(0xc0003aa120, 0x1b02b40, 0xc00009e050, 0xc000241b00, 0xc00020a500, 0xc0003441b0, 0xc000344240, 0xc0000a0c88, 0x1, 0x1, ...)
	/workspace/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go:376 +0x5ff
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).reconcile(0xc0003aa120, 0x1b02b40, 0xc00009e050, 0xc000241b00, 0xc00020a500, 0x0, 0x0, 0x0, 0xc00042ca1a)
	/workspace/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go:253 +0x1250
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).Reconcile(0xc0003aa120, 0xc00042ca1a, 0x6, 0xc00058ed00, 0x12, 0xc0003eec00, 0x0, 0x0, 0x0)
	/workspace/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go:190 +0x6a3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001ca480, 0x1709460, 0xc0003f7c20, 0x440500)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001ca480, 0x0)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001ca480)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0003fcb70)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0003fcb70, 0x3b9aca00, 0x0, 0x1, 0xc000042a20)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0003fcb70, 0x3b9aca00, 0xc000042a20)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x152ca2f]

goroutine 278 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x105
panic(0x16a8280, 0x26f9f60)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).upgradeControlPlane(0xc0003aa120, 0x1b02b40, 0xc00009e050, 0xc000241b00, 0xc00020a500, 0xc0003441b0, 0xc000344240, 0xc0000a0c88, 0x1, 0x1, ...)
	/workspace/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go:376 +0x5ff
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).reconcile(0xc0003aa120, 0x1b02b40, 0xc00009e050, 0xc000241b00, 0xc00020a500, 0x0, 0x0, 0x0, 0xc00042ca1a)
	/workspace/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go:253 +0x1250
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).Reconcile(0xc0003aa120, 0xc00042ca1a, 0x6, 0xc00058ed00, 0x12, 0xc0003eec00, 0x0, 0x0, 0x0)
	/workspace/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go:190 +0x6a3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001ca480, 0x1709460, 0xc0003f7c20, 0x440500)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001ca480, 0x0)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001ca480)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0003fcb70)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0003fcb70, 0x3b9aca00, 0x0, 0x1, 0xc000042a20)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0003fcb70, 0x3b9aca00, 0xc000042a20)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328

@detiber
Copy link
Member

detiber commented Mar 11, 2020

Looks like we are not guarding against nil when checking kcp.Spec.KubeadmConfigSpec.ClusterConfiguration.Etcd.Local

/priority critical-urgent
/milestone v0.3.1

@k8s-ci-robot k8s-ci-robot added this to the v0.3.1 milestone Mar 11, 2020
@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 11, 2020
@ncdc
Copy link
Contributor

ncdc commented Mar 11, 2020

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Mar 11, 2020
@ncdc
Copy link
Contributor

ncdc commented Mar 11, 2020

@Xenwar could you please share your KubeadmControlPlane spec?

@detiber
Copy link
Member

detiber commented Mar 11, 2020

@ncdc I'm pretty sure any spec that didn't include kcp.Spec.KubeadmConfigSpec.ClusterConfiguration.Etcd.Local would do it:

	if kcp.Spec.KubeadmConfigSpec.ClusterConfiguration.Etcd.Local != nil {
		meta := kcp.Spec.KubeadmConfigSpec.ClusterConfiguration.Etcd.Local.ImageMeta
		if err := workloadCluster.UpdateEtcdVersionInKubeadmConfigMap(ctx, meta.ImageRepository, meta.ImageTag); err != nil {
			return ctrl.Result{}, errors.Wrap(err, "failed to update the etcd version in the kubeadm config map")
		}
	}

@detiber
Copy link
Member

detiber commented Mar 11, 2020

Actually, I take that back, anything that didn't include ClusterConfiguration would likely do it

@ncdc
Copy link
Contributor

ncdc commented Mar 11, 2020

Yes, I have a fix pending, but I am asking for confirmation.

@vincepri
Copy link
Member

Sidenote: should ClusterConfiguration actually be optional, could it be defaulted to empty struct?

@detiber
Copy link
Member

detiber commented Mar 11, 2020

The problem is less around go usage and more around yaml usage with non-optional fields.

@detiber
Copy link
Member

detiber commented Mar 11, 2020

I'm generally ok if we default it via webhook.

@vincepri
Copy link
Member

I was thinking that we could use the defaulting webhook here, if we want to keep the pointer. The check for ClusterConfiguration not being nil is everywhere, so might be worth it

@ncdc
Copy link
Contributor

ncdc commented Mar 11, 2020

Fix is up at #2641

@Xenwar
Copy link
Author

Xenwar commented Mar 11, 2020

@Xenwar could you please share your KubeadmControlPlane spec?

Here is the spec

@Xenwar
Copy link
Author

Xenwar commented Mar 11, 2020

apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
metadata:
  creationTimestamp: "2020-03-11T12:30:34Z"
  finalizers:
  - kubeadm.controlplane.cluster.x-k8s.io
  generation: 2
  labels:
    cluster.x-k8s.io/cluster-name: test1
  name: test1-controlplane
  namespace: metal3
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha3
    blockOwnerDeletion: true
    controller: true
    kind: Cluster
    name: test1
    uid: 1c6532f4-c39c-461a-837e-efd5beccbf34
  resourceVersion: "29577"
  selfLink: /apis/controlplane.cluster.x-k8s.io/v1alpha3/namespaces/metal3/kubeadmcontrolplanes/test1-controlplane
  uid: b17d121f-0c69-4bc1-ba06-44515c145ed5
spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: Metal3MachineTemplate
    name: test1-controlplane
    namespace: metal3
  kubeadmConfigSpec:
    files:
    - content: |
        ! Configuration File for keepalived
        global_defs {
            notification_email {
            [email protected]
            [email protected]
            }
            notification_email_from [email protected]
            smtp_server localhost
            smtp_connect_timeout 30
        }
        vrrp_instance VI_2 {
            state MASTER
            interface enp2s0
            virtual_router_id 2
            priority 101
            advert_int 1
            virtual_ipaddress {
                192.168.111.249
            }
        }
      path: /etc/keepalived/keepalived.conf
    - content: |
        network:
            ethernets:
                enp2s0:
                    dhcp4: true
            version: 2
      owner: root:root
      path: /etc/netplan/50-cloud-init.yaml
      permissions: "0644"
    - content: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4t+xPYRq8mSz0CtM/6JESioFDqLmkqQxP6ZU9E263tJ42mAHwRmNFwpGfDPpGKWQlVV6//MNDTTmP6rTABvl1H6ryghK+aglOd4oHi+813MEUFCfWukK9Huvie11VRyb6NvF6Wsg6XzFxZsYl/21jjQLOR7tSX2wNx70wNUqdQSBOGMfzsM/b+EuAf2LMu8hecqqb/7yH5hpy+6ch5P4Krcwwr+qPOndHDNE7i9dvjJEZoRHxQhFpZBVqAECIPylzSR5OUoTqAyKUfmBGjeNrupZ4yzMNNtFyYcf09OdTaXJSmv2CqefjyfhxL5fCXlX/VXQrGpb+ghedpGRWj92/
        airshipci@airship-ci-ubuntu-metal3-img-e47ccd5d3e
      owner: root:root
      path: /tmp/akeys
      permissions: "0644"
    - content: |
        network:
          version: 2
          renderer: networkd
          bridges:
            ironicendpoint:
              interfaces: [enp1s0]
              dhcp4: yes
      owner: root:root
      path: /etc/netplan/60-ironicendpoint.yaml
      permissions: "0644"
    initConfiguration:
      localAPIEndpoint:
        advertiseAddress: ""
        bindPort: 0
      nodeRegistration:
        kubeletExtraArgs:
          node-labels: metal3.io/uuid={{ ds.meta_data.uuid }}
        name: '{{ ds.meta_data.name }}'
    joinConfiguration:
      controlPlane:
        localAPIEndpoint:
          advertiseAddress: ""
          bindPort: 0
      discovery: {}
      nodeRegistration:
        kubeletExtraArgs:
          node-labels: metal3.io/uuid={{ ds.meta_data.uuid }}
        name: '{{ ds.meta_data.name }}'
    postKubeadmCommands:
    - mkdir -p /home/ubuntu/.kube
    - cp /etc/kubernetes/admin.conf /home/ubuntu/.kube/config
    - systemctl enable --now keepalived
    - chown ubuntu:ubuntu /home/ubuntu/.kube/config
    preKubeadmCommands:
    - ip link set dev enp2s0 up
    - dhclient enp2s0
    - mv /tmp/akeys /home/ubuntu/.ssh/authorized_keys
    - chown ubuntu:ubuntu /home/ubuntu/.ssh/authorized_keys
    - apt update -y
    - netplan apply
    - apt install net-tools gcc linux-headers-$(uname -r) bridge-utils apt-transport-https
      ca-certificates curl gnupg-agent software-properties-common -y
    - apt install -y keepalived && systemctl stop keepalived
    - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
    - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu
      $(lsb_release -cs) stable"
    - curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add
      -
    - echo 'deb https://apt.kubernetes.io/ kubernetes-xenial main' > /etc/apt/sources.list.d/kubernetes.list
    - apt update -y
    - apt install docker-ce docker-ce-cli containerd.io kubelet kubeadm kubectl -y
    - systemctl enable --now docker kubelet
    - if (curl -sk --max-time 10 https://192.168.111.249:6443/healthz); then echo
      "keepalived already running";else systemctl start keepalived; fi
    - usermod -aG docker ubuntu
  replicas: 1
  version: v1.17.3
status:
  initialized: true
  replicas: 1
  selector: cluster.x-k8s.io/cluster-name=test1,cluster.x-k8s.io/control-plane=
  unavailableReplicas: 1

@ncdc
Copy link
Contributor

ncdc commented Mar 11, 2020

Thanks

@vincepri vincepri added area/control-plane Issues or PRs related to control-plane lifecycle management area/clusterctl Issues or PRs related to clusterctl and removed area/clusterctl Issues or PRs related to clusterctl labels Mar 23, 2020
@nguyenthai0107
Copy link

Hello @Xenwar and @ncdc
What i should do for fix this issue, im facing exactly the problem
"panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x3xxxxxx]"
when check log
kubectl logs -f kube-controller-manager-apps -n kube-system
and the Crashloopbackoff
kube-system kube-controller-manager-apps-preprod-pdc-platform-hcnet-vn-master-4dm5r 0/1 CrashLoopBackOff 11 (2m17s ago) 39m
thank you .

@ncdc
Copy link
Contributor

ncdc commented Sep 13, 2023

If it's the exact same panic and root cause, see #3363 (comment). Otherwise, I would recommend filing a new issue with the full stack trace.

@ncdc
Copy link
Contributor

ncdc commented Sep 13, 2023

Wait, you're asking about a pod called kube-controller-manager-apps? That is not from cluster-api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants