-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node never becomes Ready #32
Comments
You can clear this by deleting the ovs pod and it'll recover, but it then looks like the openshift-kube-apiserver-operator gets hung and does nothing. I copied logs into team-master for them to investigate |
I had two workers never become ready for a recent CI run: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/660/pull-ci-openshift-installer-master-e2e-aws/1356/artifacts/e2e-aws/nodes.json | jq '[.items[] | {"name": .metadata.name, "machine": .metadata.annotations.machine, "ready": (.status.conditions[] | select(.type == "Ready") | .status)}]'
[
{
"name": "ip-10-0-13-33.ec2.internal",
"machine": "openshift-cluster-api/ci-op-ptkkr88f-1d3f3-master-0",
"ready": "True"
},
{
"name": "ip-10-0-130-112.ec2.internal",
"machine": "openshift-cluster-api/ci-op-ptkkr88f-1d3f3-worker-us-east-1a-5ttjv",
"ready": "False"
},
{
"name": "ip-10-0-154-238.ec2.internal",
"machine": "openshift-cluster-api/ci-op-ptkkr88f-1d3f3-worker-us-east-1b-t9gzw",
"ready": "False"
},
{
"name": "ip-10-0-168-175.ec2.internal",
"machine": "openshift-cluster-api/ci-op-ptkkr88f-1d3f3-worker-us-east-1c-fd75b",
"ready": "True"
},
{
"name": "ip-10-0-17-71.ec2.internal",
"machine": "openshift-cluster-api/ci-op-ptkkr88f-1d3f3-master-1",
"ready": "True"
},
{
"name": "ip-10-0-33-245.ec2.internal",
"machine": "openshift-cluster-api/ci-op-ptkkr88f-1d3f3-master-2",
"ready": "True"
}
] And from the crash-looping container logs, the same lockfile issue @abhinavdahiya saw: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/660/pull-ci-openshift-installer-master-e2e-aws/1356/artifacts/e2e-aws/pods/openshift-sdn_ovs-hdjv5_openvswitch.log.gz | gunzip
/etc/openvswitch/conf.db does not exist ... (warning).
Creating empty database /etc/openvswitch/conf.db ovsdb-tool: I/O error: /etc/openvswitch/conf.db: failed to lock lockfile (Resource temporarily unavailable)
[FAILED]
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/660/pull-ci-openshift-installer-master-e2e-aws/1356/artifacts/e2e-aws/pods/openshift-sdn_ovs-xjfdc_openvswitch.log.gz | gunzip
/etc/openvswitch/conf.db does not exist ... (warning).
Creating empty database /etc/openvswitch/conf.db ovsdb-tool: I/O error: /etc/openvswitch/conf.db: failed to lock lockfile (Resource temporarily unavailable)
[FAILED] The symptoms in the e2e-aws build logs were:
|
This issue has come up a few times before on the OVS list. Here the guess was that there was an existing |
CRI-O issue is cri-o/cri-o#1904 |
Weird, it is intermittently unable to create the database file. There's absolutely nothing fancy about this at all - just a call to For those of you who are debugging, capture a journactl and see if there are SELinux errors. I'm looking in to this now. |
Nevermind, that's the error message when the lockfile is.. locked. Seems that an old process could be lying around. Add some logging and killing logic in #33. |
@squeed do you have some good trouble shooting steps that might come handy. we can add them to https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md |
There's another reliability improvement in #37. Then we should start seeing fewer issues. |
I think we can close this; the obvious SDN issues seem to have been resolved. |
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/657/pull-ci-openshift-installer-master-e2e-aws/1345?log#log
so saw this error in one of the CI runs, one of the master failed to become Ready:
and seeing the ocs pod on that node.
https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/657/pull-ci-openshift-installer-master-e2e-aws/1345/artifacts/e2e-aws/pods/openshift-sdn_ovs-fn8d2_openvswitch.log.gz
/cc @squeed
The text was updated successfully, but these errors were encountered: