-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"openshift-install destroy cluster" leaves auth dir, breaking next install #522
Comments
Couldn't double check this due to the current AWS quota issues. Please confirm if this is an issue. |
More on this over here and later. For now, the easiest approach is probably to use |
I don't follow how the presence of |
Yes, I did. |
Encountered the very same problem - several times. Removing the |
Yes, that makes sense. You are effectively telling the installer to ignore whatever kubeconfig it has generated and to instead use the one you have provided (which will almost always be incorrect). We should revisit this UX issue soon. |
I have been doing the following...
The result is a failed cluster. All instances behind the int and ext aws elb are OutOfService. Kublet services are failed on instances ... essentially there is no OpenShift cluster. You will see the following output (and systemd journal logs) when starting kublet ....
Executing the following fixes everything!
|
Docs for the easier |
I still think the UX is broken. There should be a default for |
This would make life easier for devs, where cluster installs happen many times a day. But I think it would make life slightly more complicated for external users, who are more likely to successfully launch a single cluster and then walk away from the installer for weeks+. I'm fine biasing the UX in favor of external users. However, folks who preserve their state file (or, with #556, their state directory) should be fine reusing an earlier run's assets exept for the 30-minute-validity kubelet-client cert and its descendants. I think we could guard against that with asset-load-time validators. The only remaining issue then would be folks who removed parents but not frozen child assets between runs. #556 does a better job tracking frozen/modified assets, with this warning (which we can strengthen as discussed there) in the dangerous case. |
I wanted to post an error message from a state my system is in now which I think is related to this issue.
|
This is a separate issues. This issues is about leftovers on the installer host. Your issue is about leftovers in the AWS account. Can you file a new issue with the full teardown logs? Check in |
Ack - #836 |
Why would having a default What are the potential issues here? I anticipate that once this GA's, it would be writing to the system default |
Fixed with this commit: b686588 Closing the issue. Please re-open if it still exists. I have tested it against a libvirt cluster and auth gets removed upon a destroy. |
Setting to the cluster name may be okay, but only when we are generating it all from scratch. We have to support the case where install-config is imported in.
~/.kube/config may not be ideal (though useful in many cases) because multiple clusters may (and likely) need to be launched from a single local machine. So isolating a cluster to its target 'dir' is strongly desirable. |
This issue still happens with
Not sure why this version is marked "dirty" - I downloaded it from the official link. |
Users occasionally have trouble with installations where they recycled an asset directory from a previous cluster, and so pick up state like expired X.509 certificates [1] or unexpected release images [2]. While current installers attempt to remove most assets upon successful cluster deletion, there are still some outstanding issues with that [3]. It's safer to just use a fresh directory, and this commit tries to get wording to that effect into each flow that passes through 'openshift-install create ...'. The analogous upstream docs are in [4]. I'm not adjusting installation-generate-ignition-configs.adoc, because it is only consumed by the metal and vSphere flows, and they both go through modules/installation-initializing-manual first. installation-initializing-manual.adoc suggests a mkdir, which will fail if the directory already exists, and these folks are already thinking about the installer loading information from their asset directory, so it didn't seem like they needed the same warning. [1]: openshift/installer#522 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1713016#c4 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1673251 [4]: https://github.com/openshift/installer/blame/8811e63e3f70196f088d6bbf3993ca9043ac3909/README.md#L53-L55
Users occasionally have trouble with installations where they recycled an asset directory from a previous cluster, and so pick up state like expired X.509 certificates [1] or unexpected release images [2]. While current installers attempt to remove most assets upon successful cluster deletion, there are still some outstanding issues with that [3]. It's safer to just use a fresh directory, and this commit tries to get wording to that effect into each flow that passes through 'openshift-install create ...'. The analogous upstream docs are in [4]. I'm not adjusting installation-generate-ignition-configs.adoc, because it is only consumed by the metal and vSphere flows, and they both go through modules/installation-initializing-manual first. installation-initializing-manual.adoc suggests a mkdir, which will fail if the directory already exists, and these folks are already thinking about the installer loading information from their asset directory, so it didn't seem like they needed the same warning. [1]: openshift/installer#522 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1713016#c4 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1673251 [4]: https://github.com/openshift/installer/blame/8811e63e3f70196f088d6bbf3993ca9043ac3909/README.md#L53-L55
After installing, there is the
auth/kubeconfig
file in the current checkout. When destroying the cluster, this file is not deleted. A following cluster install load this files, but it does not match the new certificates. Hence, the cluster does not work as expected for the user.The text was updated successfully, but these errors were encountered: