-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Libvirt installation seems to always fail #1017
Comments
Can you try following the troubleshooting guide? That should help narrow down which component is failing. |
Hi Alex,
It seems that it's normally getting stuck before it's adding the Worker
Nodes, however I don't see the cluster-manager-controllers-* pod created at
all.
All the pods I see are waiting for nodes to be available, to start running.
The relevant journalctl output is:
```
Jan 09 14:15:26 test1-bootstrap bootkube.sh[3209]: Pod
Status:openshift-kube-apiserver/openshift-kube-apiserver DoesNotExist
Jan 09 14:15:26 test1-bootstrap bootkube.sh[3209]: Pod
Status:openshift-kube-scheduler/openshift-kube-scheduler DoesNotExist
Jan 09 14:15:26 test1-bootstrap bootkube.sh[3209]: Pod
Status:openshift-kube-controller-manager/openshift-kube-controller-manager
DoesNotExist
Jan 09 14:15:26 test1-bootstrap bootkube.sh[3209]: Pod
Status:openshift-cluster-version/cluster-version-operator-7b47d58bff-x47vl
Pending
Jan 09 14:32:26 test1-bootstrap bootkube.sh[3209]: Error: error while
checking pod status: timed out waiting for the condition
Jan 09 14:32:26 test1-bootstrap bootkube.sh[3209]: Tearing down temporary
bootstrap control plane...
Jan 09 14:32:26 test1-bootstrap bootkube.sh[3209]: Error: error while
checking pod status: timed out waiting for the condition
Jan 09 14:32:26 test1-bootstrap bootkube.sh[3209]: Error: error while
checking pod status: timed out waiting for the condition
Jan 09 14:32:28 test1-bootstrap bootkube.sh[3209]: unable to find container
etcd-signer: no container with name or ID etcd-signer found: no such
container
Jan 09 14:32:28 test1-bootstrap systemd[1]: bootkube.service: main process
exited, code=exited, status=125/n/a
Jan 09 14:32:28 test1-bootstrap systemd[1]: Unit bootkube.service entered
failed state.
Jan 09 14:32:28 test1-bootstrap systemd[1]: bootkube.service failed.
Jan 09 14:32:33 test1-bootstrap systemd[1]: bootkube.service holdoff time
over, scheduling restart.
Jan 09 14:32:33 test1-bootstrap systemd[1]: Stopped Bootstrap a Kubernetes
cluster.
Jan 09 14:32:33 test1-bootstrap systemd[1]: Started Bootstrap a Kubernetes
cluster.
Jan 09 14:32:33 test1-bootstrap bootkube.sh[15269]: unable to pull
quay.io/openshift-release-dev/ocp-release@sha256@sha256:e237499d3b118e25890550daad8b17274af93baf855914a9c6f8f07ebc095dea:
error getting default registries to try: invalid reference format
Jan 09 14:32:33 test1-bootstrap systemd[1]: bootkube.service: main process
exited, code=exited, status=125/n/a
Jan 09 14:32:33 test1-bootstrap systemd[1]: Unit bootkube.service entered
failed state.
Jan 09 14:32:33 test1-bootstrap systemd[1]: bootkube.service failed.
Jan 09 14:32:39 test1-bootstrap systemd[1]: bootkube.service holdoff time
over, scheduling restart.
Jan 09 14:32:39 test1-bootstrap systemd[1]: Stopped Bootstrap a Kubernetes
cluster.
Jan 09 14:32:39 test1-bootstrap systemd[1]: Started Bootstrap a Kubernetes
cluster.
Jan 09 14:32:39 test1-bootstrap bootkube.sh[15461]: unable to pull
quay.io/openshift-release-dev/ocp-release@sha256@sha256:e237499d3b118e25890550daad8b17274af93baf855914a9c6f8f07ebc095dea:
error getting default registries to try: invalid reference format
```
It's worth mentioning that once (in over 10 cluster creation attempts) it
managed to advance, and started creating and running more pods.
However even that time the installer got timed out.
|
We've seen this issue before ( It looks like you beat me to #933, which links to containers/podman#2086. |
Looks like there are still additional issues, in addition to the one solved. Sometimes the installer completes, but in many cases it fails, stopping basically at the same point. |
It all looks like there is some sort of race condition that causes this to fail most of the attempts. kubectl get events --all-namespacesNAMESPACE LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE |
OK, after further trial and error and some debugging it seems that in order to have a successful installation one need:
|
No need to patch; #785 lets you adjust this with environment variables.
This already gets project-README billing. Further discussion should go in #522. |
I think the sub-issues described above each have an existing issue in their own right, so I'm closing this. If there's still anything unique to this issue, let us know. /close |
@wking: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Version
Platform (aws|libvirt|openstack):
libvirt
What happened?
It seems that the libvirt based installation always fails for me.
The last output from the journalctl -b -f -u bootkube.service is as following:
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver DoesNotExist
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-kube-scheduler/openshift-kube-scheduler DoesNotExist
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager DoesNotExist
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-cluster-version/cluster-version-operator-56f577b95f-gbkwq Pending
The installation then waits for some more time until it finally fails.
I tried to perform the install as per the instructions in the https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md
The text was updated successfully, but these errors were encountered: