-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg: Pin to RHCOS 47.198 and quay.io/openshift-release-dev/ocp-release:4.0.0-4 #848
Conversation
Through f7d6d29 (Merge pull request openshift#806 from sallyom/log-url-clarify-pw, 2018-12-07).
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5a49c4e
to
d108128
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I can live with the two failures there, there are other passing router tests. /retest |
The errors from the previous e2e-aws were:
and:
|
And it's still working its way through teardown, but job 2082 failed the same two test with the same two reported errors. So I suspect it's a real issue (although perhaps a peripheral one) and not a flake. I have |
So the two
master-1 has:
and, for reasons that are not clear to me, my master-2 logs end with:
Maybe they went too quiet and my The most concerning entries are $ grep -h 'pull.*secret' master-* | sort | head -n2
Dec 09 05:41:25 ip-10-0-29-127 hyperkube[4124]: W1209 05:41:25.939760 4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found. The image pull may not succeed.
Dec 09 05:41:36 ip-10-0-29-127 hyperkube[4124]: W1209 05:41:36.213621 4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found. The image pull may not succeed.
$ grep -h 'pull.*secret' master-* | sort | tail -n2
Dec 09 06:07:30 ip-10-0-29-127 hyperkube[4124]: W1209 06:07:30.170277 4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/olm-operator-75f785f98b-55dgs due to secrets "coreos-pull-secret" not found. The image pull may not succeed.
Dec 09 06:07:48 ip-10-0-29-127 hyperkube[4124]: W1209 06:07:48.171280 4124 kubelet_pods.go:841] Unable to retrieve pull secret openshift-operator-lifecycle-manager/coreos-pull-secret for openshift-operator-lifecycle-manager/catalog-operator-5499796c76-72mlv due to secrets "coreos-pull-secret" not found. The image pull may not succeed. And those errors have also been reported over in operator-framework/operator-lifecycle-manager#607. |
While there are multiple HAProxy tests in the e2e suite, the ones failing are the only two stress tests: $ git config remote.origin.url
https://github.com/openshift/origin.git
$ git describe --dirty
v4.0.0-alpha.0-759-g9d2874f
$ git grep '\.It(' test/extended/router/stress.go
test/extended/router/stress.go: g.It("converges when multiple routers are writing status", func() {
test/extended/router/stress.go: g.It("converges when multiple routers are writing conflicting status", func() { They're also the only two that use The event log also looks clean for this replica set (since that would mitigate the generic log mesage. But we only have: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/848/pull-ci-openshift-installer-master-e2e-aws/2082/artifacts/e2e-aws/events.json | jq '.items[] | select(.involvedObject.kind == "ReplicaSet" and (.involvedObject.name | contains("router")))'
{
"apiVersion": "v1",
"count": 1,
"eventTime": null,
"firstTimestamp": "2018-12-09T05:41:29Z",
"involvedObject": {
"apiVersion": "apps/v1",
"kind": "ReplicaSet",
"name": "router-default-6c45f76f75",
"namespace": "openshift-ingress",
"resourceVersion": "7040",
"uid": "13d07968-fb75-11e8-b05a-125c3d3368ba"
},
"kind": "Event",
"lastTimestamp": "2018-12-09T05:41:29Z",
"message": "Created pod: router-default-6c45f76f75-4g9t7",
"metadata": {
"creationTimestamp": "2018-12-09T05:41:29Z",
"name": "router-default-6c45f76f75.156e93ac29775a8b",
"namespace": "openshift-ingress",
"resourceVersion": "7067",
"selfLink": "/api/v1/namespaces/openshift-ingress/events/router-default-6c45f76f75.156e93ac29775a8b",
"uid": "13ec0643-fb75-11e8-b05a-125c3d3368ba"
},
"reason": "SuccessfulCreate",
"reportingComponent": "",
"reportingInstance": "",
"source": {
"component": "replicaset-controller"
},
"type": "Normal"
} which looks fine. /retest |
Job 2084 crashed and burned:
possibly due to an Kubernetes API server installer being OOMed: $ KUBECONFIG=kubeconfig oc get pods --all-namespaces | grep OOM
openshift-kube-apiserver installer-2-ip-10-0-23-167.ec2.internal 0/1 OOMKilled 0 19m
$ KUBECONFIG=kubeconfig oc get pod -o yaml -n openshift-kube-apiserver installer-2-ip-10-0-23-167.ec2.internal
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2018-12-09T08:47:19Z
labels:
app: installer
name: installer-2-ip-10-0-23-167.ec2.internal
namespace: openshift-kube-apiserver
resourceVersion: "8997"
selfLink: /api/v1/namespaces/openshift-kube-apiserver/pods/installer-2-ip-10-0-23-167.ec2.internal
uid: 095a065f-fb8f-11e8-afd9-0a80660a673e
spec:
containers:
- args:
- -v=4
- --revision=2
- --namespace=openshift-kube-apiserver
- --pod=kube-apiserver-pod
- --resource-dir=/etc/kubernetes/static-pod-resources
- --pod-manifest-dir=/etc/kubernetes/manifests
- --configmaps=kube-apiserver-pod
- --configmaps=config
- --configmaps=aggregator-client-ca
- --configmaps=client-ca
- --configmaps=etcd-serving-ca
- --configmaps=kubelet-serving-ca
- --configmaps=sa-token-signing-certs
- --secrets=aggregator-client
- --secrets=etcd-client
- --secrets=kubelet-client
- --secrets=serving-cert
command:
- cluster-kube-apiserver-operator
- installer
image: quay.io/openshift-release-dev/ocp-v4.0@sha256:a38d0e240ab50573d5193eec7ecf6046b4b0860f8b2184d34c4ad648f020333e
imagePullPolicy: Always
name: installer
resources: {}
securityContext:
privileged: true
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/kubernetes/
name: kubelet-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: installer-sa-token-scmj9
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: installer-sa-dockercfg-p4tf6
nodeName: ip-10-0-23-167.ec2.internal
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
runAsUser: 0
serviceAccount: installer-sa
serviceAccountName: installer-sa
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /etc/kubernetes/
type: ""
name: kubelet-dir
- name: installer-sa-token-scmj9
secret:
defaultMode: 420
secretName: installer-sa-token-scmj9
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-12-09T08:47:19Z
reason: PodCompleted
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-12-09T08:47:19Z
reason: PodCompleted
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
reason: PodCompleted
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-12-09T08:47:19Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: cri-o://cfd8efde622ee86e28e89f58cb9d4850bbe2b1198d74b6937f4865ee013aa7ed
image: quay.io/openshift-release-dev/ocp-v4.0@sha256:a38d0e240ab50573d5193eec7ecf6046b4b0860f8b2184d34c4ad648f020333e
imageID: quay.io/openshift-release-dev/ocp-v4.0@sha256:a38d0e240ab50573d5193eec7ecf6046b4b0860f8b2184d34c4ad648f020333e
lastState: {}
name: installer
ready: false
restartCount: 0
state:
terminated:
containerID: cri-o://cfd8efde622ee86e28e89f58cb9d4850bbe2b1198d74b6937f4865ee013aa7ed
exitCode: 0
finishedAt: 2018-12-09T08:47:22Z
reason: OOMKilled
startedAt: 2018-12-09T08:47:21Z
hostIP: 10.0.23.167
phase: Succeeded
podIP: 10.128.0.21
qosClass: BestEffort
startTime: 2018-12-09T08:47:19Z |
Also, it looks like 4.0.0-4 will be impacted by openshift/machine-config-operator#225, because: $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.0.0-4 --commits | grep dns-operator
cluster-dns-operator https://github.com/openshift/cluster-dns-operator 119e58d03d441282a764ba51619536f5a7c4ded8 As discussed in openshift/cluster-dns-operator#63, the $ git log --graph --oneline -8 119e58d03d441
* 119e58d Merge pull request #61 from ironcladlou/registry-hosts-fqdn
|\
| * 7f9b291 Fix uninstall script
| * 3e6c3aa Support both relative and absolute service names
* | 9b78211 Merge pull request #59 from sosiouxme/patch-1
|\ \
| |/
|/|
| * d2fedf1 Makefile: don't specify GOARCH
* | 03223f1 Merge pull request #60 from Miciah/fix-registry-service-name
|\ \
| |/
|/|
| * 6c533d9 Fix registry service name
|/
* 4c2aed1 Merge pull request #56 from pravisankar/reg-node-resolver
|\ So until we fix that, we may be stuck on 4.0.0-3. On the other hand, the image-registry operator conflict @abhinavdahiya points out in openshift/machine-config-operator#225 is from way back in openshift/cluster-image-registry-operator#72. So maybe we just need to live with occasional MCD-dirty-file reboots until we get openshift/machine-config-operator#225 landed? |
…e:4.0.0-4 That's the latest RHCOS release: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/builds.json | jq '{latest: .builds[0], timestamp}' { "latest": "47.198", "timestamp": "2018-12-08T23:13:22Z" } And Clayton just pushed 4.0.0-alpha.0-2018-12-07-090414 to quay.io/openshift-release-dev/ocp-release:4.0.0-4. That's not the most recent release, but it's the most-recent stable release ;). Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (Use RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03, openshift/release#1793).
d108128
to
d54e597
Compare
/retest |
51a43d3
to
889b3c4
Compare
We ended up pushing this out via a Git branch without bothering with this PR ;). |
DO NOT MERGE!
That's the latest RHCOS release:
And @smarterclayton just pushed 4.0.0-alpha.0-2018-12-07-090414 to quay.io/openshift-release-dev/ocp-release:4.0.0-4. That's not the most recent release, but it's the most-recent stable release ;).
Renaming
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE
gets us CI testing of the pinned release despite openshift/release@60007df2 (openshift/release#1793).