Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebooting after: "Updating machineconfig from {hash} to {same-hash}" #224

Closed
wking opened this issue Dec 8, 2018 · 5 comments
Closed

Rebooting after: "Updating machineconfig from {hash} to {same-hash}" #224

wking opened this issue Dec 8, 2018 · 5 comments

Comments

@wking
Copy link
Member

wking commented Dec 8, 2018

Poking around on master-0 during this run:

[core@ip-10-0-7-211 ~]$ sudo crictl ps -a
CONTAINER ID        IMAGE                                                                                                                          CREATED             STATE               NAME                          ATTEMPT
6413013f1b203       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:299840e1ef37549af1c6bb9b45ed1f4eb48ca51b0384772709ce88e3d9d60bfc    9 seconds ago       Running             operator                      2
434fa05a795cb       f6df05d6a36426dde85b9c778b435a7aaa12543d69d603f629b8f5273356ec7b                                                               13 seconds ago      Running             cluster-dns-operator          1
9c3e0b96b8325       f06c190859935d127c2efee77beb689fbacb53ec93b88547c25392fc970289f7                                                               14 seconds ago      Running             operator                      1
ebc194bc9b966       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:6667ac4aecae183dfd4e6ae4277dd86ca977e0a3b9feefee653043105503c6d6    18 seconds ago      Exited              tuned                         1
011545eb60059       05503aa686767edf45b70172c8975c8b9743bb6a6c1c182c181eb36cd610f6fc                                                               18 seconds ago      Running             machine-config-server         1
d8aa7ead7c3dc       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:581af93fda80257651d621dade879e438f846a5bf39040dd0259006fc3b73820    18 seconds ago      Running             machine-approver-controller   1
413a0cca0629b       bb84dbdafdfa61c29ca989c652d63debe03a043c2099f6ad7097ac28865bd406                                                               20 seconds ago      Running             cluster-network-operator      1
c0ace414b0ce2       25dbbded706585310ae3ebc743bcc59659f826fff1bac24be4a56b83d37e3cc2                                                               25 seconds ago      Running             machine-config-daemon         1
a2e1ef4a0c97b       03de8f11d9e07ee2b23be6d48dc849b9a5e24e4ab4c3ab758bdcd583b3b8fbd9                                                               27 seconds ago      Running             sdn-controller                1
c08642e38da06       registry.svc.ci.openshift.org/ci-op-1mpypn4i/release@sha256:77dd81dbdb38c941fc288f551f39ddef1de251384cbfb8f6755ff7f072ab9a13   27 seconds ago      Running             cluster-version-operator      1
8f80023565506       1d2ec4ba1e697f9c0eb69c154888e6f09007a3d2aad4c34bb7868cec86b8f8f8                                                               28 seconds ago      Running             sdn                           1
fb31831e0665c       1d2ec4ba1e697f9c0eb69c154888e6f09007a3d2aad4c34bb7868cec86b8f8f8                                                               28 seconds ago      Running             openvswitch                   1
89155ae218212       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:f82a3c247a4c59538a3d40ad1a2257383420440e15c4675b2e11ad620601bf98    30 seconds ago      Running             openshift-kube-apiserver      1
edafa7d07d214       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:4d0106d7428828c87ed905728742fbc11bd8b30d0c87165359699d0a475e2315    30 seconds ago      Running             kube-controller-manager       1
abab48f8aaf0e       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:4d0106d7428828c87ed905728742fbc11bd8b30d0c87165359699d0a475e2315    30 seconds ago      Running             scheduler                     1
b735c768b8baf       94bc3af972c98ce73f99d70bd72144caa8b63e541ccc9d844960b7f0ca77d7c4                                                               38 seconds ago      Running             etcd-member                   1
b79135e346749       b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06                                                               39 seconds ago      Exited              certs                         0
04b4489ae42e3       04a052dbf6cb5ac2afa57eb41c37a2964ee16c7ee62986900aceb38f369c8411                                                               39 seconds ago      Exited              discovery                     1
1e38cb58d5063       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:14b67d7a5d1ec05dd45e60663b6e4e0c460cf7f429397dd3a3ec2d1997e52096    2 minutes ago       Exited              machine-config-daemon         0
e8715191161af       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:08fbd1a5a90b59572a3b097389fc17f7ae9b9b1ef7e1f3d19103e280fc656518    2 minutes ago       Exited              console                       0
9c19639ae072d       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:9ecf75c8384f5ec5431b4a3267466833c1caa212a183ce8b3e7f42bc3f7e1dcc    3 minutes ago       Exited              machine-config-server         0
439c340ab23ce       03de8f11d9e07ee2b23be6d48dc849b9a5e24e4ab4c3ab758bdcd583b3b8fbd9                                                               3 minutes ago       Exited              controller-manager            0
a12b3df1bae1f       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:f82a3c247a4c59538a3d40ad1a2257383420440e15c4675b2e11ad620601bf98    4 minutes ago       Exited              openshift-kube-apiserver      0
74d8d492a43db       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:f82a3c247a4c59538a3d40ad1a2257383420440e15c4675b2e11ad620601bf98    4 minutes ago       Exited              openshift-apiserver           0
98c47bd0e9246       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:299840e1ef37549af1c6bb9b45ed1f4eb48ca51b0384772709ce88e3d9d60bfc    4 minutes ago       Exited              installer                     0
6c15d73ec3e14       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:c8c110b8733d0d352ddc5fe35ba9eeac913b7609c2c9c778586f2bb74f281681    4 minutes ago       Exited              registry                      0
7796346af6b01       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:581af93fda80257651d621dade879e438f846a5bf39040dd0259006fc3b73820    5 minutes ago       Exited              machine-approver-controller   0
afb16e1bdfea8       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:40fa0e40cf625f314586d47a4199d5de985b7f980d7c4b650ab7f2c0f74f41b2    5 minutes ago       Exited              registry-ca-hostmapper        0
8aff3b343e0dc       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:299840e1ef37549af1c6bb9b45ed1f4eb48ca51b0384772709ce88e3d9d60bfc    6 minutes ago       Exited              operator                      1
158f097ba79cb       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:4d0106d7428828c87ed905728742fbc11bd8b30d0c87165359699d0a475e2315    7 minutes ago       Exited              kube-controller-manager       0
128c03f9fa9a8       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:38d43ca65fce090c19b092b1c0962781c146f9fc65f3228eb96f5aad684c9119    7 minutes ago       Exited              installer                     0
bf6ac2a6570f2       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:4d0106d7428828c87ed905728742fbc11bd8b30d0c87165359699d0a475e2315    8 minutes ago       Exited              scheduler                     0
b43c90d9cebb6       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:b532d9351b803b5a03cf6777f12d725b4973851957497ea3e2b37313aadd6750    8 minutes ago       Exited              operator                      0
4f333ad437f34       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:a494ee2152687973d1987598cf82f7606b2c4fbb48c7f09f2e897cb417ab88f1    8 minutes ago       Exited              installer                     0
28eed681186eb       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:299840e1ef37549af1c6bb9b45ed1f4eb48ca51b0384772709ce88e3d9d60bfc    8 minutes ago       Exited              installer                     0
c0d2a89890a29       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:ce86e514320b680f39735323288cfd19caee5a9480b086b4b275454aef94136e    8 minutes ago       Exited              dns-node-resolver             0
d244a6189ea7b       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:e4936a702d7d466a64a6a9359f35c7ad528bba7c35fe5c582a90e46f9051d8b8    8 minutes ago       Exited              dns                           0
7e12941ef62cd       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:3287ba30af508652bda691490ad7dbd97febf4e90b72e23228f87c9038fc387e    9 minutes ago       Exited              operator                      0
9f0bcda47b3bc       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:f538d65377e0891e70555009f4778c0b1782360ea0f4309adec905cad752593d    9 minutes ago       Exited              cluster-dns-operator          0
dc6d7fba5de79       registry.svc.ci.openshift.org/ci-op-1mpypn4i/release@sha256:77dd81dbdb38c941fc288f551f39ddef1de251384cbfb8f6755ff7f072ab9a13   9 minutes ago       Exited              cluster-version-operator      0
b46b53bc15b68       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:0f51e8c6713cf23fac9b4b61d3e10e453936c139ee9a58171090b5ffe7cd37ae    9 minutes ago       Exited              openvswitch                   0
98d49e2e52366       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:0f51e8c6713cf23fac9b4b61d3e10e453936c139ee9a58171090b5ffe7cd37ae    9 minutes ago       Exited              sdn                           0
62fe7aeacb776       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:f82a3c247a4c59538a3d40ad1a2257383420440e15c4675b2e11ad620601bf98    10 minutes ago      Exited              sdn-controller                0
e2026a15792b4       registry.svc.ci.openshift.org/ci-op-1mpypn4i/stable@sha256:a8aa3e53cbaeae806210878f0c7b499b636a963b2a52f4d1eea6db3dfa2fdc98    11 minutes ago      Exited              cluster-network-operator      0
751acc223850b       quay.io/coreos/etcd@sha256:688e6c102955fe927c34db97e6352d0e0962554735b2db5f2f66f3f94cfe8fd1                                    13 minutes ago      Exited              etcd-member                   0
[core@ip-10-0-7-211 ~]$ sudo crictl logs 1e38cb58d5063
I1208 07:04:15.643512   12912 start.go:51] Version: 3.11.0-321-g9d379bd8-dirty
I1208 07:04:15.644717   12912 start.go:88] starting node writer
I1208 07:04:15.654184   12912 run.go:22] Running captured: chroot /rootfs rpm-ostree status --json
I1208 07:04:15.780961   12912 daemon.go:125] Booted osImageURL: registry.svc.ci.openshift.org/rhcos/maipo@sha256:ede3888e50016d61a720af2fe3f80e67e86bd819e16516ac36538456d46e0d77 (47.198)
I1208 07:04:18.656693   12912 start.go:139] Calling chroot("/rootfs")
I1208 07:04:18.727383   12912 daemon.go:673] While getting MachineConfig ea8bbb8e8e084123f670a4bf90258ac8, got: machineconfigs.machineconfiguration.openshift.io "ea8bbb8e8e084123f670a4bf90258ac8" not found. Retrying...
I1208 07:04:28.807287   12912 update.go:95] Checking if configs are reconcilable
I1208 07:04:28.807317   12912 daemon.go:572] No target osImageURL provided
E1208 07:04:28.807901   12912 daemon.go:653] content mismatch for file: "/etc/hosts"; expected: # IPv4 and IPv6 localhost aliases
127.0.0.1 localhost
::1   localhost

# Internal registry hack
10.3.0.25 docker-registry.default.svc
; received: # IPv4 and IPv6 localhost aliases
127.0.0.1 localhost
::1   localhost

# Internal registry hack
10.3.0.25 docker-registry.default.svc
172.30.198.180 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver
I1208 07:04:28.881287   12912 update.go:37] Updating machineconfig from ea8bbb8e8e084123f670a4bf90258ac8 to ea8bbb8e8e084123f670a4bf90258ac8
I1208 07:04:28.881311   12912 update.go:95] Checking if configs are reconcilable
I1208 07:04:28.881325   12912 update.go:199] Updating files
I1208 07:04:28.881336   12912 update.go:387] Writing file "/etc/containers/registries.conf"
I1208 07:04:28.882785   12912 update.go:387] Writing file "/etc/hosts"
I1208 07:04:28.884044   12912 update.go:387] Writing file "/etc/kubernetes/manifests/etcd-member.yaml"
I1208 07:04:28.885615   12912 update.go:387] Writing file "/etc/sysconfig/crio-network"
I1208 07:04:28.886943   12912 update.go:387] Writing file "/etc/kubernetes/static-pod-resources/etcd-member/ca.crt"
I1208 07:04:28.888766   12912 update.go:387] Writing file "/etc/kubernetes/static-pod-resources/etcd-member/root-ca.crt"
I1208 07:04:28.890648   12912 update.go:387] Writing file "/etc/kubernetes/kubelet.conf"
I1208 07:04:28.892163   12912 update.go:387] Writing file "/var/lib/kubelet/config.json"
I1208 07:04:28.894227   12912 update.go:387] Writing file "/etc/docker/certs.d/docker-registry.default.svc:5000/ca.crt"
I1208 07:04:28.896036   12912 update.go:387] Writing file "/etc/kubernetes/ca.crt"
I1208 07:04:28.897806   12912 update.go:387] Writing file "/etc/sysctl.d/forward.conf"
I1208 07:04:28.899180   12912 update.go:321] Writing systemd unit "kubelet.service"
I1208 07:04:28.899367   12912 update.go:359] Enabling systemd unit "kubelet.service"
I1208 07:04:28.899513   12912 update.go:268] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink
I1208 07:04:28.899530   12912 update.go:220] Deleting stale data
I1208 07:04:28.899550   12912 update.go:482] No target osImageURL provided
E1208 07:04:28.899591   12912 event.go:259] Could not construct reference to: '&v1.Node{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ip-10-0-7-211.ec2.internal", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.NodeSpec{PodCIDR:"", ProviderID:"", Unschedulable:false, Taints:[]v1.Taint(nil), ConfigSource:(*v1.NodeConfigSource)(nil), DoNotUse_ExternalID:""}, Status:v1.NodeStatus{Capacity:v1.ResourceList(nil), Allocatable:v1.ResourceList(nil), Phase:"", Conditions:[]v1.NodeCondition(nil), Addresses:[]v1.NodeAddress(nil), DaemonEndpoints:v1.NodeDaemonEndpoints{KubeletEndpoint:v1.DaemonEndpoint{Port:0}}, NodeInfo:v1.NodeSystemInfo{MachineID:"", SystemUUID:"", BootID:"", KernelVersion:"", OSImage:"", ContainerRuntimeVersion:"", KubeletVersion:"", KubeProxyVersion:"", OperatingSystem:"", Architecture:""}, Images:[]v1.ContainerImage(nil), VolumesInUse:[]v1.UniqueVolumeName(nil), VolumesAttached:[]v1.AttachedVolume(nil), Config:(*v1.NodeConfigStatus)(nil)}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'Reboot' 'Node will reboot into config ea8bbb8e8e084123f670a4bf90258ac8'
I1208 07:04:28.899753   12912 update.go:497] machine-config-daemon initiating reboot: Node will reboot into config ea8bbb8e8e084123f670a4bf90258ac8

It's possible that the underlying issue is due to whatever took down the initial etcd-member pod (751acc223850b, I'm still looking into this), but I don't understand why the MCD keeps going after:

Updating machineconfig from ea8bbb8e8e084123f670a4bf90258ac8 to ea8bbb8e8e084123f670a4bf90258ac8

Is the issue the selfLink was empty, can't make reference? It seems surprising to trigger a reboot because you failed to submit an event. Some previous discussion of reboot triggers in #199, but I don't understand either that issue or this one clearly enough to want to attempt to tie them together ;). @cgwalters comment about identical configs seems like this issue though.

@abhinavdahiya
Copy link
Contributor

@wking

content mismatch for file: "/etc/hosts"; expected: # IPv4 and IPv6 localhost aliases
127.0.0.1 localhost
::1   localhost

# Internal registry hack
10.3.0.25 docker-registry.default.svc
; received: # IPv4 and IPv6 localhost aliases
127.0.0.1 localhost
::1   localhost

# Internal registry hack
10.3.0.25 docker-registry.default.svc
172.30.198.180 image-registry.openshift-image-registry.svc image-registry.openshift-image-registry.svc.cluster.local # openshift-generated-node-resolver

So looks like somebody is editing the file /etc/hosts and MCD owns that file and reconciles it back. And the daemon currently always reboots after syncing.

I would guess service-signer

@wking
Copy link
Member Author

wking commented Dec 8, 2018

Ah, that's probably what's going on. That image-registry line is not here. Looks like the DNS operator. I'll file an issue over there.

@abhinavdahiya
Copy link
Contributor

/cc @crawford @aaronlevy as more people changing configuration on disk and will end up stepping on each other and if you step on MCD it will keep rebooting nodes.... :(

@wking
Copy link
Member Author

wking commented Dec 8, 2018

Closing this in favor of openshift/cluster-dns-operator#63

@wking wking closed this as completed Dec 8, 2018
@wking
Copy link
Member Author

wking commented Dec 8, 2018

... if you step on MCD it will keep rebooting nodes....

Do not taunt happy fun MCD :p.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants