Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libvirt installation seems to always fail #1017

Closed
lveyde opened this issue Jan 8, 2019 · 10 comments
Closed

Libvirt installation seems to always fail #1017

lveyde opened this issue Jan 8, 2019 · 10 comments

Comments

@lveyde
Copy link
Contributor

lveyde commented Jan 8, 2019

Version

$ openshift-install version
v0.8.0 / v0.9.0 / v0.9.1 (tried several versions)

Platform (aws|libvirt|openstack):

libvirt

What happened?

It seems that the libvirt based installation always fails for me.

The last output from the journalctl -b -f -u bootkube.service is as following:
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-kube-apiserver/openshift-kube-apiserver DoesNotExist
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-kube-scheduler/openshift-kube-scheduler DoesNotExist
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-kube-controller-manager/openshift-kube-controller-manager DoesNotExist
Jan 08 17:09:44 test1-bootstrap bootkube.sh[3117]: Pod Status:openshift-cluster-version/cluster-version-operator-56f577b95f-gbkwq Pending

The installation then waits for some more time until it finally fails.

I tried to perform the install as per the instructions in the https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md

@crawford
Copy link
Contributor

crawford commented Jan 9, 2019

Can you try following the troubleshooting guide? That should help narrow down which component is failing.

@lveyde
Copy link
Contributor Author

lveyde commented Jan 9, 2019 via email

@crawford
Copy link
Contributor

crawford commented Jan 9, 2019

unable to pull quay.io/openshift-release-dev/ocp-release@sha256@sha256:e237499d3b118e25890550daad8b17274af93baf855914a9c6f8f07ebc095dea

We've seen this issue before (though I can't find any issues to link at the moment).

It looks like you beat me to #933, which links to containers/podman#2086.

@lveyde
Copy link
Contributor Author

lveyde commented Jan 9, 2019

OK, the issue seems to be the one here:
#933

Pushed a patch (PR) to fix this:
#1032

@lveyde
Copy link
Contributor Author

lveyde commented Jan 9, 2019

Looks like there are still additional issues, in addition to the one solved.

Sometimes the installer completes, but in many cases it fails, stopping basically at the same point.
Container logs show some errors like;
2019-01-09T23:10:28.030565746+00:00 stderr F E0109 23:10:28.030532 1 sync.go:126] error creating resourcebuilder for imagestream "openshift/cli" (image.openshift.io/v1, 130 of 222): failed to get resource type: no matches for kind "ImageStream" in version "image.openshift.io/v1"

@lveyde
Copy link
Contributor Author

lveyde commented Jan 10, 2019

It all looks like there is some sort of race condition that causes this to fail most of the attempts.
This is what I see in the kubernetes events, may be this will give some sort of hint on what causes this to stop and refuse to create worker nodes:

kubectl get events --all-namespaces

NAMESPACE LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
kube-system 14m 14m 1 kube-scheduler.157887a96c0b3b72 Endpoints Normal LeaderElection default-scheduler test1-bootstrap_e8f8c61a-14ef-11e9-9153-52fdfc072182 became leader
kube-system 14m 14m 1 kube-controller-manager.157887ab2011f797 ConfigMap Normal LeaderElection kube-controller-manager test1-bootstrap_f2899522-14ef-11e9-add2-52fdfc072182 became leader
openshift-kube-apiserver-operator 14m 14m 11 openshift-kube-apiserver-operator.157887ae250d88fe Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "openshift-kube-apiserver-operator-58b8c455c5": replicasets.apps "openshift-kube-apiserver-operator-58b8c455c5" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-core-operators 14m 14m 11 origin-cluster-osin-operator.157887ae251a2626 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "origin-cluster-osin-operator-774dc44fd8": replicasets.apps "origin-cluster-osin-operator-774dc44fd8" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-cluster-version 14m 14m 11 cluster-version-operator.157887ae251ff703 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "cluster-version-operator-7b47d58bff": replicasets.apps "cluster-version-operator-7b47d58bff" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-core-operators 14m 14m 11 origin-cluster-osin-operator2.157887ae25131c4c Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "origin-cluster-osin-operator2-669747b677": replicasets.apps "origin-cluster-osin-operator2-669747b677" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-dns-operator 14m 14m 11 dns-operator.157887ae251342c8 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "dns-operator-6dbdd7df84": replicasets.apps "dns-operator-6dbdd7df84" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-core-operators 14m 14m 11 openshift-service-cert-signer-operator.157887ae2a32ec37 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "openshift-service-cert-signer-operator-784d9677b5": replicasets.apps "openshift-service-cert-signer-operator-784d9677b5" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-cluster-kube-scheduler-operator 14m 14m 11 openshift-cluster-kube-scheduler-operator.157887ae35df8373 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "openshift-cluster-kube-scheduler-operator-7776d4f85c": replicasets.apps "openshift-cluster-kube-scheduler-operator-7776d4f85c" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-kube-controller-manager-operator 13m 14m 11 openshift-kube-controller-manager-operator.157887ae9b30803d Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "openshift-kube-controller-manager-operator-66cbb459f8": replicasets.apps "openshift-kube-controller-manager-operator-66cbb459f8" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-cluster-openshift-controller-manager-operator 13m 14m 10 openshift-cluster-openshift-controller-manager-operator.157887af6d749c54 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "openshift-cluster-openshift-controller-manager-operator-5dd78fdd87": replicasets.apps "openshift-cluster-openshift-controller-manager-operator-5dd78fdd87" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-apiserver-operator 13m 14m 11 openshift-apiserver-operator.157887aee7043d15 Deployment Warning ReplicaSetCreateError deployment-controller Failed to create new replica set "openshift-apiserver-operator-77756dfc66": replicasets.apps "openshift-apiserver-operator-77756dfc66" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion apps/v1 Kind Deployment: no matches for kind "Deployment" in version "apps/v1"
openshift-operator-lifecycle-manager 13m 13m 2 catalog-operator-6c9888d8c4-8f57p.157887b091f880b2 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-kube-apiserver-operator 13m 13m 2 openshift-kube-apiserver-operator-58b8c455c5-tqrwx.157887b09faba5ca Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-operator-lifecycle-manager 13m 13m 1 catalog-operator-6c9888d8c4.157887b0924a6a6a ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: catalog-operator-6c9888d8c4-8f57p
openshift-operator-lifecycle-manager 13m 13m 1 catalog-operator.157887b086676e65 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set catalog-operator-6c9888d8c4 to 1
openshift-dns-operator 13m 13m 1 dns-operator-6dbdd7df84.157887b0964d877c ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: dns-operator-6dbdd7df84-b89ft
openshift-dns-operator 13m 13m 2 dns-operator-6dbdd7df84-b89ft.157887b096897f36 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-core-operators 13m 13m 1 origin-cluster-osin-operator2.157887b098534ecd Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set origin-cluster-osin-operator2-669747b677 to 1
openshift-cluster-openshift-controller-manager-operator 13m 13m 1 openshift-cluster-openshift-controller-manager-operator-5dd78fdd87.157887b0aecf2853 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: openshift-cluster-openshift-controller-manager-operator-5dj5mt8
openshift-core-operators 13m 13m 1 origin-cluster-osin-operator2-669747b677.157887b09fc1ddfd ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: origin-cluster-osin-operator2-669747b677-wddgr
openshift-kube-apiserver-operator 13m 13m 1 openshift-kube-apiserver-operator-58b8c455c5.157887b09fb9ff67 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: openshift-kube-apiserver-operator-58b8c455c5-tqrwx
openshift-cluster-openshift-controller-manager-operator 13m 13m 1 openshift-cluster-openshift-controller-manager-operator.157887b0a9607149 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set openshift-cluster-openshift-controller-manager-operator-5dd78fdd87 to 1
openshift-kube-apiserver-operator 13m 13m 1 openshift-kube-apiserver-operator.157887b0985640ab Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set openshift-kube-apiserver-operator-58b8c455c5 to 1
openshift-core-operators 13m 13m 2 origin-cluster-osin-operator2-669747b677-wddgr.157887b0a0ece9b4 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-operator-lifecycle-manager 13m 13m 2 olm-operator-7955888657-xtfpl.157887b09583867b Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-operator-lifecycle-manager 13m 13m 1 olm-operator-7955888657.157887b092307ebe ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: olm-operator-7955888657-xtfpl
openshift-dns-operator 13m 13m 1 dns-operator.157887b08d780f20 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set dns-operator-6dbdd7df84 to 1
openshift-operator-lifecycle-manager 13m 13m 1 olm-operator.157887b0862d4541 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set olm-operator-7955888657 to 1
openshift-cluster-kube-scheduler-operator 13m 13m 1 openshift-cluster-kube-scheduler-operator.157887b0c13d37cd Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set openshift-cluster-kube-scheduler-operator-7776d4f85c to 1
openshift-core-operators 13m 13m 2 openshift-service-cert-signer-operator-784d9677b5-pll4f.157887b0c35e72c0 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-core-operators 13m 13m 2 origin-cluster-osin-operator-774dc44fd8-28k7c.157887b0c1e0ab27 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-core-operators 13m 13m 1 origin-cluster-osin-operator-774dc44fd8.157887b0c1e2ed7f ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: origin-cluster-osin-operator-774dc44fd8-28k7c
openshift-cluster-version 13m 13m 1 cluster-version-operator.157887b0c1364d47 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set cluster-version-operator-7b47d58bff to 1
openshift-core-operators 13m 13m 1 origin-cluster-osin-operator.157887b0c1326626 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set origin-cluster-osin-operator-774dc44fd8 to 1
openshift-cluster-version 13m 13m 1 cluster-version-operator-7b47d58bff.157887b0c523ea00 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: cluster-version-operator-7b47d58bff-xlwkl
openshift-cluster-openshift-controller-manager-operator 13m 13m 2 openshift-cluster-openshift-controller-manager-operator-5dj5mt8.157887b0ae9c7b65 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-core-operators 13m 13m 1 openshift-service-cert-signer-operator.157887b0c13989e9 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set openshift-service-cert-signer-operator-784d9677b5 to 1
openshift-cluster-kube-scheduler-operator 13m 13m 2 openshift-cluster-kube-scheduler-operator-7776d4f85c-tzqn8.157887b0c526b96e Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-cluster-kube-scheduler-operator 13m 13m 1 openshift-cluster-kube-scheduler-operator-7776d4f85c.157887b0c50363e2 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: openshift-cluster-kube-scheduler-operator-7776d4f85c-tzqn8
openshift-core-operators 13m 13m 1 openshift-service-cert-signer-operator-784d9677b5.157887b0c23b8ee4 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: openshift-service-cert-signer-operator-784d9677b5-pll4f
openshift-cluster-version 13m 13m 2 cluster-version-operator-7b47d58bff-xlwkl.157887b0c6b9e1eb Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-kube-controller-manager-operator 13m 13m 2 openshift-kube-controller-manager-operator-66cbb459f8-n4xjw.157887b106f817f0 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-kube-controller-manager-operator 13m 13m 1 openshift-kube-controller-manager-operator-66cbb459f8.157887b107167705 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: openshift-kube-controller-manager-operator-66cbb459f8-n4xjw
openshift-kube-controller-manager-operator 13m 13m 1 openshift-kube-controller-manager-operator.157887b1025fc1ac Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set openshift-kube-controller-manager-operator-66cbb459f8 to 1
openshift-apiserver-operator 13m 13m 1 openshift-apiserver-operator.157887b1544be102 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set openshift-apiserver-operator-77756dfc66 to 1
openshift-apiserver-operator 13m 13m 2 openshift-apiserver-operator-77756dfc66-pkwgd.157887b15899208f Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-apiserver-operator 13m 13m 1 openshift-apiserver-operator-77756dfc66.157887b1589d2926 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: openshift-apiserver-operator-77756dfc66-pkwgd
openshift-cluster-api 13m 13m 1 cluster-autoscaler-operator-6855f55d94.157887b1a9e7b504 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: cluster-autoscaler-operator-6855f55d94-cf4r8
openshift-cluster-api 13m 13m 1 cluster-autoscaler-operator.157887b1a77cae1a Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set cluster-autoscaler-operator-6855f55d94 to 1
openshift-cluster-api 13m 13m 2 cluster-autoscaler-operator-6855f55d94-cf4r8.157887b1a9d8ef1f Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-cluster-api 13m 13m 1 machine-api-operator.157887b323ddaa86 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set machine-api-operator-67f964b4d to 1
openshift-cluster-api 13m 13m 1 machine-api-operator-67f964b4d.157887b32b454c41 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: machine-api-operator-67f964b4d-2nzp5
openshift-cluster-api 13m 13m 2 machine-api-operator-67f964b4d-2nzp5.157887b32b4e67bb Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-machine-config-operator 13m 13m 1 machine-config-operator-769967ddf5.157887b3679e6d0a ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: machine-config-operator-769967ddf5-26bcg
openshift-machine-config-operator 13m 13m 1 machine-config-operator.157887b363f55f90 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set machine-config-operator-769967ddf5 to 1
openshift-machine-config-operator 13m 13m 2 machine-config-operator-769967ddf5-26bcg.157887b367985859 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods
openshift-cluster-machine-approver 13m 13m 1 machine-approver-86b68b66f7.157887b37a97eeb9 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: machine-approver-86b68b66f7-bjc5f
openshift-cluster-machine-approver 13m 13m 1 machine-approver.157887b377adfed0 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set machine-approver-86b68b66f7 to 1
openshift-cluster-machine-approver 13m 13m 2 machine-approver-86b68b66f7-bjc5f.157887b37a489e71 Pod Warning FailedScheduling default-scheduler no nodes available to schedule pods

@lveyde
Copy link
Contributor Author

lveyde commented Jan 13, 2019

OK, after further trial and error and some debugging it seems that in order to have a successful installation one need:

  • Have a patch for the double @sha256 issue (at least until the podman is fixed) (lveyde@4dcd72e)

  • Have a patch for increased bootstrap VM RAM (lveyde@58b2e12)

  • (Highly recommended) have a patch for an increases master VM RAM size (I used 6GB though, and it seems to be enough)

  • (Also highly recommended) Increase the validity of the Kubelet certificate from the default 30 minutes to something more lasting, say a few hours. I personally increased it to 10 years, to avoid any issues (lveyde@41772d0)

  • And last, but not least - make sure to ALWAYS use a new configuration directory.
    This last point can't be understated, as otherwise even if one run "destroy" command that seemingly cleans up the environment, the next "create" command will use the pre-created certificates which will cause the installation to fail, since the one for kubelet will be already expired for long time, especially if it has been created with the default validity period.

@wking
Copy link
Member

wking commented Jan 13, 2019

  • Highly recommended) have a patch for an increases master VM RAM size (I used 6GB though, and it seems to be enough)

No need to patch; #785 lets you adjust this with environment variables.

  • And last, but not least - make sure to ALWAYS use a new configuration directory.

This already gets project-README billing. Further discussion should go in #522.

@wking
Copy link
Member

wking commented Jan 15, 2019

I think the sub-issues described above each have an existing issue in their own right, so I'm closing this. If there's still anything unique to this issue, let us know.

/close

@openshift-ci-robot
Copy link
Contributor

@wking: Closing this issue.

In response to this:

I think the sub-issues described above each have an existing issue in their own right, so I'm closing this. If there's still anything unique to this issue, let us know.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants