pkg: Pin to RHCOS 47.198 and quay.io/openshift-release-dev/ocp-release:4.0.0-4 #848

wking · 2018-12-08T23:39:29Z

DO NOT MERGE!

That's the latest RHCOS release:

$ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/builds.json | jq '{latest: .builds[0], timestamp}'
{
  "latest": "47.198",
  "timestamp": "2018-12-08T23:13:22Z"
}

And @smarterclayton just pushed 4.0.0-alpha.0-2018-12-07-090414 to quay.io/openshift-release-dev/ocp-release:4.0.0-4. That's not the most recent release, but it's the most-recent stable release ;).

Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (openshift/release#1793).

Through f7d6d29 (Merge pull request openshift#806 from sallyom/log-url-clarify-pw, 2018-12-07).

openshift-ci-robot · 2018-12-08T23:39:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2018-12-08T23:43:04Z

This cherry-picks #773 onto #841 and bumps the pinned versions. No need to merge this, just chime in with yea/nay or whatever ;). We should get past the recent CI blockages via the pinned, older update payload.

crawford

LGTM

smarterclayton · 2018-12-09T05:22:10Z

I can live with the two failures there, there are other passing router tests.

/retest

wking · 2018-12-09T06:03:15Z

The errors from the previous e2e-aws were:

fail [github.com/openshift/origin/test/extended/router/stress.go:176]: Expected error:
    <*errors.errorString | 0xc4216a8230>: {
        s: "replicaset \"router\" never became ready",
    }
    replicaset "router" never became ready
not to have occurred
...
failed: (3m35s) 2018-12-09T00:13:07 "[Conformance][Area:Networking][Feature:Router] The HAProxy router converges when multiple routers are writing conflicting status [Suite:openshift/conformance/parallel/minimal] [Suite:openshift/smoke-4]"

and:

fail [github.com/openshift/origin/test/extended/router/stress.go:90]: Expected error:
    <*errors.errorString | 0xc421fc8b00>: {
        s: "replicaset \"router\" never became ready",
    }
    replicaset "router" never became ready
not to have occurred
...
failed: (3m21s) 2018-12-09T00:18:39 "[Conformance][Area:Networking][Feature:Router] The HAProxy router converges when multiple routers are writing status [Suite:openshift/conformance/parallel/minimal] [Suite:openshift/smoke-4]"

wking · 2018-12-09T06:11:34Z

And it's still working its way through teardown, but job 2082 failed the same two test with the same two reported errors. So I suspect it's a real issue (although perhaps a peripheral one) and not a flake. I have journalctl dumps from the three masters for this run; I'll go through and see if anything suspicious is mentioned around the time of the two failures.

wking · 2018-12-09T06:47:16Z

So the two replicaset "router" never became ready errors from job 2082 were at 05:58:24 and 05:59:01. My logs from master-0 have:

Dec 09 05:58:13 ip-10-0-13-92 sshd[32430]: PAM 5 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=101.72.24.174
Dec 09 05:58:13 ip-10-0-13-92 sshd[32430]: PAM service(sshd) ignoring max retries; 6 > 3
Dec 09 05:58:23 ip-10-0-13-92 hyperkube[4119]: E1209 05:58:23.439962    4119 event.go:203] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"echoserver-sourceip.156e9498265d35a8", GenerateName:"", Namespace:"e2e-tests-services-tr59j", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-services-tr59j", Name:"echoserver-sourceip", UID:"519e82f7-fb77-11e8-b7fb-125c3d3368ba", APIVersion:"v1", ResourceVersion:"46737", FieldPath:"spec.containers{echoserver}"}, Reason:"Killing", Message:"Killing container with id cri-o://echoserver:Need to kill Pod", Source:v1.EventSource{Component:"kubelet", Host:"ip-10-0-13-92.ec2.internal"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbefb499fd9e1ffa8, ext:1615801035461, loc:(*time.Location)(0x9061c80)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbefb499fd9e1ffa8, ext:1615801035461, loc:(*time.Location)(0x9061c80)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "echoserver-sourceip.156e9498265d35a8" is forbidden: unable to create new content in namespace e2e-tests-services-tr59j because it is being terminated' (will not retry!)
Dec 09 05:58:23 ip-10-0-13-92 hyperkube[4119]: E1209 05:58:23.498490    4119 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/containers/storage/overlay/b97f5f5fcabe7c8f8367cf3f42c5d12b47e4cd47bf320f07aca3bd3c1add28cb/diff with output stdout: , stderr: du: cannot access ‘/var/lib/containers/storage/overlay/b97f5f5fcabe7c8f8367cf3f42c5d12b47e4cd47bf320f07aca3bd3c1add28cb/diff’: No such file or directory
Dec 09 05:58:23 ip-10-0-13-92 hyperkube[4119]: - exit status 1, rootInodeErr: cmd [ionice -c3 nice -n 19 find /var/lib/containers/storage/overlay/b97f5f5fcabe7c8f8367cf3f42c5d12b47e4cd47bf320f07aca3bd3c1add28cb/diff -xdev -printf .] failed. stderr: find: ‘/var/lib/containers/storage/overlay/b97f5f5fcabe7c8f8367cf3f42c5d12b47e4cd47bf320f07aca3bd3c1add28cb/diff’: No such file or directory
Dec 09 05:58:23 ip-10-0-13-92 hyperkube[4119]: ; err: exit status 1, extraDiskErr: du command failed on /var/log/pods/4ada9195-fb77-11e8-b7fb-125c3d3368ba/nginx/0.log with output stdout: , stderr: du: cannot access ‘/var/log/pods/4ada9195-fb77-11e8-b7fb-125c3d3368ba/nginx/0.log’: No such file or directory
Dec 09 05:58:23 ip-10-0-13-92 hyperkube[4119]: - exit status 1
Dec 09 05:58:23 ip-10-0-13-92 kernel: device veth98832a16 left promiscuous mode
Dec 09 05:58:24 ip-10-0-13-92 hyperkube[4119]: I1209 05:58:24.610336    4119 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-qbx76" (UniqueName: "kubernetes.io/secret/519e82f7-fb77-11e8-b7fb-125c3d3368ba-default-token-qbx76") pod "519e82f7-fb77-11e8-b7fb-125c3d3368ba" (UID: "519e82f7-fb77-11e8-b7fb-125c3d3368ba")
Dec 09 05:58:24 ip-10-0-13-92 hyperkube[4119]: I1209 05:58:24.620722    4119 operation_generator.go:688] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/519e82f7-fb77-11e8-b7fb-125c3d3368ba-default-token-qbx76" (OuterVolumeSpecName: "default-token-qbx76") pod "519e82f7-fb77-11e8-b7fb-125c3d3368ba" (UID: "519e82f7-fb77-11e8-b7fb-125c3d3368ba"). InnerVolumeSpecName "default-token-qbx76". PluginName "kubernetes.io/secret", VolumeGidValue ""
Dec 09 05:58:24 ip-10-0-13-92 hyperkube[4119]: I1209 05:58:24.710863    4119 reconciler.go:301] Volume detached for volume "default-token-qbx76" (UniqueName: "kubernetes.io/secret/519e82f7-fb77-11e8-b7fb-125c3d3368ba-default-token-qbx76") on node "ip-10-0-13-92.ec2.internal" DevicePath ""
Dec 09 05:58:24 ip-10-0-13-92 systemd[1]: Removed slice libcontainer container kubepods-besteffort-pod519e82f7_fb77_11e8_b7fb_125c3d3368ba.slice.
Dec 09 05:58:35 ip-10-0-13-92 hyperkube[4119]: E1209 05:58:35.980350    4119 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/containers/storage/overlay/00065b4bb2b49b3bac8562bc8c7bc173003381d5b37ae45aad79e2ac6a9bfa42/diff with output stdout: , stderr: du: cannot access ‘/var/lib/containers/storage/overlay/00065b4bb2b49b3bac8562bc8c7bc173003381d5b37ae45aad79e2ac6a9bfa42/diff’: No such file or directory
Dec 09 05:58:35 ip-10-0-13-92 hyperkube[4119]: - exit status 1, rootInodeErr: cmd [ionice -c3 nice -n 19 find /var/lib/containers/storage/overlay/00065b4bb2b49b3bac8562bc8c7bc173003381d5b37ae45aad79e2ac6a9bfa42/diff -xdev -printf .] failed. stderr: find: ‘/var/lib/containers/storage/overlay/00065b4bb2b49b3bac8562bc8c7bc173003381d5b37ae45aad79e2ac6a9bfa42/diff’: No such file or directory
Dec 09 05:58:35 ip-10-0-13-92 hyperkube[4119]: ; err: exit status 1, extraDiskErr: du command failed on /var/log/pods/519e82f7-fb77-11e8-b7fb-125c3d3368ba/echoserver/0.log with output stdout: , stderr: du: cannot access ‘/var/log/pods/519e82f7-fb77-11e8-b7fb-125c3d3368ba/echoserver/0.log’: No such file or directory
Dec 09 05:58:35 ip-10-0-13-92 hyperkube[4119]: - exit status 1
Dec 09 06:08:19 ip-10-0-13-92 systemd[1]: Found ordering cycle on local-fs.target/stop

master-1 has:

Dec 09 05:57:49 ip-10-0-29-127 hyperkube[4124]: E1209 05:57:49.612584    4124 event.go:203] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"netserver
-0.156e949043eda45e", GenerateName:"", Namespace:"e2e-tests-pod-network-test-r6lgk", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Lo
cation)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializ
ers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-pod-network-test-r6lgk", Name:"netserver-0", UID:"40bc99e7-fb77-11e8-8a4
7-0a42d33f8e78", APIVersion:"v1", ResourceVersion:"44559", FieldPath:"spec.containers{webserver}"}, Reason:"Killing", Message:"Killing container with id cri-o://webserver:Need to kill Pod", Source:v1.EventSource
{Component:"kubelet", Host:"ip-10-0-29-127.ec2.internal"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbefb49976201425e, ext:1582427716889, loc:(*time.Location)(0x9061c80)}}, LastTimestamp:v1.Time{Time:time.Tim
e{wall:0xbefb49976223028f, ext:1582429928720, loc:(*time.Location)(0x9061c80)}}, Count:2, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSerie
s)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "netserver-0.156e949043eda45e" is forbidden: unable to create new content in namespace e2e-tests-po
d-network-test-r6lgk because it is being terminated' (will not retry!)
Dec 09 05:57:49 ip-10-0-29-127 hyperkube[4124]: I1209 05:57:49.624872    4124 reconciler.go:181] operationExecutor.UnmountVolume started for volume "default-token-vzfpn" (UniqueName: "kubernetes.io/secret/40bc99
e7-fb77-11e8-8a47-0a42d33f8e78-default-token-vzfpn") pod "40bc99e7-fb77-11e8-8a47-0a42d33f8e78" (UID: "40bc99e7-fb77-11e8-8a47-0a42d33f8e78")
Dec 09 05:57:49 ip-10-0-29-127 hyperkube[4124]: I1209 05:57:49.636750    4124 operation_generator.go:688] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/40bc99e7-fb77-11e8-8a47-0a42d33f8e78-de
fault-token-vzfpn" (OuterVolumeSpecName: "default-token-vzfpn") pod "40bc99e7-fb77-11e8-8a47-0a42d33f8e78" (UID: "40bc99e7-fb77-11e8-8a47-0a42d33f8e78"). InnerVolumeSpecName "default-token-vzfpn". PluginName "ku
bernetes.io/secret", VolumeGidValue ""
Dec 09 05:57:49 ip-10-0-29-127 kernel: device veth172007c3 left promiscuous mode
Dec 09 05:57:49 ip-10-0-29-127 hyperkube[4124]: I1209 05:57:49.725433    4124 reconciler.go:301] Volume detached for volume "default-token-vzfpn" (UniqueName: "kubernetes.io/secret/40bc99e7-fb77-11e8-8a47-0a42d3
3f8e78-default-token-vzfpn") on node "ip-10-0-29-127.ec2.internal" DevicePath ""
Dec 09 05:57:50 ip-10-0-29-127 hyperkube[4124]: W1209 05:57:50.060836    4124 pod_container_deletor.go:75] Container "eed06008b22a4e0dbee86719690513b8f1c6a904e66060ed39c0639859d8e4ee" not found in pod's containers
Dec 09 05:57:50 ip-10-0-29-127 hyperkube[4124]: W1209 05:57:50.880066    4124 kubelet_getters.go:264] Path "/var/lib/kubelet/pods/40bc99e7-fb77-11e8-8a47-0a42d33f8e78/volumes" does not exist
Dec 09 05:57:50 ip-10-0-29-127 systemd[1]: Removed slice libcontainer container kubepods-besteffort-pod40bc99e7_fb77_11e8_8a47_0a42d33f8e78.slice.
Dec 09 05:57:50 ip-10-0-29-127 hyperkube[4124]: E1209 05:57:50.904732    4124 kuberuntime_container.go:65] Can't make a ref to pod "netserver-0_e2e-tests-pod-network-test-r6lgk(40bc99e7-fb77-11e8-8a47-0a42d33f8e
78)", container webserver: selfLink was empty, can't make reference
Dec 09 05:58:14 ip-10-0-29-127 hyperkube[4124]: W1209 05:58:14.172269    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 05:59:00 ip-10-0-29-127 hyperkube[4124]: W1209 05:59:00.170282    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 05:59:24 ip-10-0-29-127 hyperkube[4124]: W1209 05:59:24.169851    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 06:00:17 ip-10-0-29-127 hyperkube[4124]: W1209 06:00:17.171568    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 06:00:46 ip-10-0-29-127 hyperkube[4124]: W1209 06:00:46.169641    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 06:01:41 ip-10-0-29-127 hyperkube[4124]: W1209 06:01:41.170521    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.

and, for reasons that are not clear to me, my master-2 logs end with:

Dec 09 05:57:50 ip-10-0-44-91 hyperkube[4107]: W1209 05:57:50.936783    4107 pod_container_deletor.go:75] Container "44acd41e1d9f5d0a766928c73c19138b85068647cfddfd7ebe76074c94af3673" not found in pod's containers

Maybe they went too quiet and my ssh core@... journalctl -f | tee master-2 connection was dropped.

The most concerning entries are secrets "coreos-pull-secret" not found, although I'm not familiar with the tests, maybe that's expected occasionally. Just in case, here are brackets on that issue:

$ grep -h 'pull.*secret' master-* | sort | head -n2
Dec 09 05:41:25 ip-10-0-29-127 hyperkube[4124]: W1209 05:41:25.939760    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 05:41:36 ip-10-0-29-127 hyperkube[4124]: W1209 05:41:36.213621    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
$ grep -h 'pull.*secret' master-* | sort | tail -n2
Dec 09 06:07:30 ip-10-0-29-127 hyperkube[4124]: W1209 06:07:30.170277    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.
Dec 09 06:07:48 ip-10-0-29-127 hyperkube[4124]: W1209 06:07:48.171280    4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found.  The image pull may not succeed.

And those errors have also been reported over in operator-framework/operator-lifecycle-manager#607.

wking · 2018-12-09T08:27:21Z

While there are multiple HAProxy tests in the e2e suite, the ones failing are the only two stress tests:

$ git config remote.origin.url
https://github.com/openshift/origin.git
$ git describe --dirty
v4.0.0-alpha.0-759-g9d2874f
$ git grep '\.It(' test/extended/router/stress.go
test/extended/router/stress.go:		g.It("converges when multiple routers are writing status", func() {
test/extended/router/stress.go:		g.It("converges when multiple routers are writing conflicting status", func() {

They're also the only two that use waitForReadyReplicaSet, which is where we're seeing the failure. I'll run again and try to watch that replica set. And once we green up CI, someone should file a PR dumping the replica set's status, because "never became ready" isn't as helpful as "currently has status $STATUS for $REASONS" :p.

The event log also looks clean for this replica set (since that would mitigate the generic log mesage. But we only have:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/848/pull-ci-openshift-installer-master-e2e-aws/2082/artifacts/e2e-aws/events.json | jq '.items[] | select(.involvedObject.kind == "ReplicaSet" and (.involvedObject.name | contains("router")))'
{
  "apiVersion": "v1",
  "count": 1,
  "eventTime": null,
  "firstTimestamp": "2018-12-09T05:41:29Z",
  "involvedObject": {
    "apiVersion": "apps/v1",
    "kind": "ReplicaSet",
    "name": "router-default-6c45f76f75",
    "namespace": "openshift-ingress",
    "resourceVersion": "7040",
    "uid": "13d07968-fb75-11e8-b05a-125c3d3368ba"
  },
  "kind": "Event",
  "lastTimestamp": "2018-12-09T05:41:29Z",
  "message": "Created pod: router-default-6c45f76f75-4g9t7",
  "metadata": {
    "creationTimestamp": "2018-12-09T05:41:29Z",
    "name": "router-default-6c45f76f75.156e93ac29775a8b",
    "namespace": "openshift-ingress",
    "resourceVersion": "7067",
    "selfLink": "/api/v1/namespaces/openshift-ingress/events/router-default-6c45f76f75.156e93ac29775a8b",
    "uid": "13ec0643-fb75-11e8-b05a-125c3d3368ba"
  },
  "reason": "SuccessfulCreate",
  "reportingComponent": "",
  "reportingInstance": "",
  "source": {
    "component": "replicaset-controller"
  },
  "type": "Normal"
}

which looks fine.

/retest

wking · 2018-12-09T09:19:18Z

Job 2084 crashed and burned:

Error: 249 fail, 14 pass, 63 skip (18m4s)

possibly due to an Kubernetes API server installer being OOMed:

$ KUBECONFIG=kubeconfig oc get pods --all-namespaces | grep OOM
openshift-kube-apiserver                                  installer-2-ip-10-0-23-167.ec2.internal                           0/1       OOMKilled     0          19m
$ KUBECONFIG=kubeconfig oc get pod -o yaml -n openshift-kube-apiserver installer-2-ip-10-0-23-167.ec2.internal
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-12-09T08:47:19Z
  labels:
    app: installer
  name: installer-2-ip-10-0-23-167.ec2.internal
  namespace: openshift-kube-apiserver
  resourceVersion: "8997"
  selfLink: /api/v1/namespaces/openshift-kube-apiserver/pods/installer-2-ip-10-0-23-167.ec2.internal
  uid: 095a065f-fb8f-11e8-afd9-0a80660a673e
spec:
  containers:
  - args:
    - -v=4
    - --revision=2
    - --namespace=openshift-kube-apiserver
    - --pod=kube-apiserver-pod
    - --resource-dir=/etc/kubernetes/static-pod-resources
    - --pod-manifest-dir=/etc/kubernetes/manifests
    - --configmaps=kube-apiserver-pod
    - --configmaps=config
    - --configmaps=aggregator-client-ca
    - --configmaps=client-ca
    - --configmaps=etcd-serving-ca
    - --configmaps=kubelet-serving-ca
    - --configmaps=sa-token-signing-certs
    - --secrets=aggregator-client
    - --secrets=etcd-client
    - --secrets=kubelet-client
    - --secrets=serving-cert
    command:
    - cluster-kube-apiserver-operator
    - installer
    image: quay.io/openshift-release-dev/ocp-v4.0@sha256:a38d0e240ab50573d5193eec7ecf6046b4b0860f8b2184d34c4ad648f020333e
    imagePullPolicy: Always
    name: installer
    resources: {}
    securityContext:
      privileged: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /etc/kubernetes/
      name: kubelet-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: installer-sa-token-scmj9
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: installer-sa-dockercfg-p4tf6
  nodeName: ip-10-0-23-167.ec2.internal
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    runAsUser: 0
  serviceAccount: installer-sa
  serviceAccountName: installer-sa
  terminationGracePeriodSeconds: 30
  volumes:
  - hostPath:
      path: /etc/kubernetes/
      type: ""
    name: kubelet-dir
  - name: installer-sa-token-scmj9
    secret:
      defaultMode: 420
      secretName: installer-sa-token-scmj9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-12-09T08:47:19Z
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-12-09T08:47:19Z
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: 2018-12-09T08:47:19Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://cfd8efde622ee86e28e89f58cb9d4850bbe2b1198d74b6937f4865ee013aa7ed
    image: quay.io/openshift-release-dev/ocp-v4.0@sha256:a38d0e240ab50573d5193eec7ecf6046b4b0860f8b2184d34c4ad648f020333e
    imageID: quay.io/openshift-release-dev/ocp-v4.0@sha256:a38d0e240ab50573d5193eec7ecf6046b4b0860f8b2184d34c4ad648f020333e
    lastState: {}
    name: installer
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://cfd8efde622ee86e28e89f58cb9d4850bbe2b1198d74b6937f4865ee013aa7ed
        exitCode: 0
        finishedAt: 2018-12-09T08:47:22Z
        reason: OOMKilled
        startedAt: 2018-12-09T08:47:21Z
  hostIP: 10.0.23.167
  phase: Succeeded
  podIP: 10.128.0.21
  qosClass: BestEffort
  startTime: 2018-12-09T08:47:19Z

wking · 2018-12-09T16:10:50Z

Also, it looks like 4.0.0-4 will be impacted by openshift/machine-config-operator#225, because:

$ oc adm release info quay.io/openshift-release-dev/ocp-release:4.0.0-4 --commits | grep dns-operator
  cluster-dns-operator    https://github.com/openshift/cluster-dns-operator    119e58d03d441282a764ba51619536f5a7c4ded8

As discussed in openshift/cluster-dns-operator#63, the /etc/hosts contention is from openshift/cluster-dns-operator#56. And:

$ git log --graph --oneline -8 119e58d03d441
*   119e58d Merge pull request #61 from ironcladlou/registry-hosts-fqdn
|\  
| * 7f9b291 Fix uninstall script
| * 3e6c3aa Support both relative and absolute service names
* |   9b78211 Merge pull request #59 from sosiouxme/patch-1
|\ \  
| |/  
|/|   
| * d2fedf1 Makefile: don't specify GOARCH
* |   03223f1 Merge pull request #60 from Miciah/fix-registry-service-name
|\ \  
| |/  
|/|   
| * 6c533d9 Fix registry service name
|/  
*   4c2aed1 Merge pull request #56 from pravisankar/reg-node-resolver
|\

So until we fix that, we may be stuck on 4.0.0-3. On the other hand, the image-registry operator conflict @abhinavdahiya points out in openshift/machine-config-operator#225 is from way back in openshift/cluster-image-registry-operator#72. So maybe we just need to live with occasional MCD-dirty-file reboots until we get openshift/machine-config-operator#225 landed?

pkg/rhcos/builds.go

…e:4.0.0-4 That's the latest RHCOS release: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/builds.json | jq '{latest: .builds[0], timestamp}' { "latest": "47.198", "timestamp": "2018-12-08T23:13:22Z" } And Clayton just pushed 4.0.0-alpha.0-2018-12-07-090414 to quay.io/openshift-release-dev/ocp-release:4.0.0-4. That's not the most recent release, but it's the most-recent stable release ;). Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (Use RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03, openshift/release#1793).

crawford · 2018-12-10T15:23:49Z

/retest

wking · 2018-12-10T18:22:24Z

We ended up pushing this out via a Git branch without bothering with this PR ;).

CHANGELOG: Document changes since v0.5.0

dcc8077

Through f7d6d29 (Merge pull request openshift#806 from sallyom/log-url-clarify-pw, 2018-12-07).

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 8, 2018

openshift-ci-robot requested review from aaronlevy and rajatchopra December 8, 2018 23:39

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 8, 2018

wking force-pushed the version-0.6.0-pins branch from 5a49c4e to d108128 Compare December 8, 2018 23:39

crawford reviewed Dec 9, 2018

View reviewed changes

wking mentioned this pull request Dec 9, 2018

Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator... operator-framework/operator-lifecycle-manager#607

Closed

jlebon reviewed Dec 10, 2018

View reviewed changes

pkg/rhcos/builds.go Outdated Show resolved Hide resolved

wking force-pushed the version-0.6.0-pins branch from d108128 to d54e597 Compare December 10, 2018 15:09

crawford force-pushed the master-0.6.0 branch 3 times, most recently from 51a43d3 to 889b3c4 Compare December 10, 2018 16:44

wking force-pushed the master-0.6.0 branch from 889b3c4 to 3b3e376 Compare December 10, 2018 16:52

wking closed this Dec 10, 2018

wking mentioned this pull request Dec 10, 2018

OOMKilled pod on master node #823

Closed

wking deleted the version-0.6.0-pins branch December 11, 2018 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg: Pin to RHCOS 47.198 and quay.io/openshift-release-dev/ocp-release:4.0.0-4 #848

pkg: Pin to RHCOS 47.198 and quay.io/openshift-release-dev/ocp-release:4.0.0-4 #848

wking commented Dec 8, 2018 •

edited

Loading

openshift-ci-robot commented Dec 8, 2018

wking commented Dec 8, 2018

crawford left a comment

smarterclayton commented Dec 9, 2018

wking commented Dec 9, 2018

wking commented Dec 9, 2018

wking commented Dec 9, 2018 •

edited

Loading

wking commented Dec 9, 2018

wking commented Dec 9, 2018

wking commented Dec 9, 2018

crawford commented Dec 10, 2018

wking commented Dec 10, 2018

pkg: Pin to RHCOS 47.198 and quay.io/openshift-release-dev/ocp-release:4.0.0-4 #848

pkg: Pin to RHCOS 47.198 and quay.io/openshift-release-dev/ocp-release:4.0.0-4 #848

Conversation

wking commented Dec 8, 2018 • edited Loading

openshift-ci-robot commented Dec 8, 2018

wking commented Dec 8, 2018

crawford left a comment

Choose a reason for hiding this comment

smarterclayton commented Dec 9, 2018

wking commented Dec 9, 2018

wking commented Dec 9, 2018

wking commented Dec 9, 2018 • edited Loading

wking commented Dec 9, 2018

wking commented Dec 9, 2018

wking commented Dec 9, 2018

crawford commented Dec 10, 2018

wking commented Dec 10, 2018

wking commented Dec 8, 2018 •

edited

Loading

wking commented Dec 9, 2018 •

edited

Loading