-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenStack] Destroy cluster can not work well since the resource dependency relationship #891
Comments
For manual clean up this is what needs to be done (as far as I can see):
btw it will help if network names differ between clusters. It is hard to identify which network serves which cluster because all networks have the same name "openshift" and same subnets ip ranges. |
After I clean all the interfaces in the router created by openshift-install, ➜ installer git:(master) ✗ bin/openshift-install destroy cluster --log-level debug --dir install-2018-12-13-14:55:51
DEBUG Deleting openstack routers
DEBUG Deleting openstack containers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting openstack security-groups
DEBUG Deleting openstack servers
DEBUG Deleting openstack ports
DEBUG Deleting Security Group: 984c789f-dd1f-4a85-91f4-45a623de1c80
DEBUG Deleting Subnet: 14bbf8ea-5053-4716-b955-850c1b92eb48
DEBUG Exiting deleting openstack ports
DEBUG goroutine deletePorts complete
DEBUG Exiting deleting openstack servers
DEBUG goroutine deleteServers complete
DEBUG Deleting Router: 0acd3020-ce0e-4004-83db-db61cc463a28
DEBUG Deleting Security Group: f68a2f52-c61b-4193-bbee-2126d105db5d
DEBUG Deleting network: 8b47d67d-0c32-438e-88c7-fc673c28f983
DEBUG Exiting deleting openstack security-groups
DEBUG Exiting deleting openstack routers
DEBUG Deleting container: qe-wjiang.ocp.tt.testing
FATAL Resource not found
➜ installer git:(master) ✗ bin/openshift-install destroy cluster --log-level debug --dir install-2018-12-13-14:55:51
DEBUG Deleting openstack servers
DEBUG Deleting openstack networks
DEBUG Deleting openstack ports
DEBUG Deleting openstack subnets
DEBUG Deleting openstack security-groups
DEBUG Deleting openstack containers
DEBUG Deleting openstack routers
DEBUG Exiting deleting openstack routers
DEBUG goroutine deleteRouters complete
DEBUG Exiting deleting openstack ports
DEBUG goroutine deletePorts complete
DEBUG Exiting deleting openstack servers
DEBUG goroutine deleteServers complete
DEBUG Exiting deleting openstack networks
DEBUG goroutine deleteNetworks complete
DEBUG Exiting deleting openstack subnets
DEBUG goroutine deleteSubnets complete
DEBUG Exiting deleting openstack security-groups
DEBUG goroutine deleteSecurityGroups complete
DEBUG Exiting deleting openstack containers
DEBUG goroutine deleteContainers complete
DEBUG Purging asset "Terraform Variables" from disk
DEBUG Purging asset "Kubeconfig Admin" from disk |
/assign @hardys |
Thanks for the report - this seems similar to #726 but it seems we need to soft-fail on the router deletion since the port is still in use, e.g we return instead of fail here: I'll test a patch locally and push for test/review shortly |
Yeah we could serialize the deletion of resources - I took a similar approach to that used by awstagdeprovision for the AWS implementation, which tries to delete all the resources in parallel then soft-fails on things where dependency errors happen. Ideally we'd just walk a dependency graph instead, but we don't have a complete view of all resources e.g created by terraform during bootstrapping then subsequently during the operator driven scale out.
Could you raise a separate issue for this please, since I don't think it's related to destroy, and perhaps not even related to OpenStack? |
I couldn't reproduce this error locally but I pushed #911 which I think should fix it, @wjiangjay could you perhaps confirm this resolves your issue if you can reproduce? |
As reported in issue openshift#891 this can fail until assigned ports are deleted, so return such that a deletion may be re-tried. Closes: openshift#891
@hardys
And as I said before, the router can not be delete since still have some interfaces on it. |
@wjiangjay ack thanks, apologies seems I misunderstood the issue - I'll push a follow-up patch to remove the interfaces - can you please re-open this issue or raise another one so we can track the fix? |
/reopen |
@russellb: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
As reported in openshift#891 this is not the right fix and can result in a loop if the router delete fails. Instead we need to remove interfaces from the router before attempting deletion. For now revert the incorrect fix and I'll push a follow-up with revised solution pending further local testing. Related: openshift#891 This reverts commit fb73b45.
Ok I pushed #920 to revert the incorrect fix and will follow-up on Monday with a revised solution |
@hardys Thanks for this, and seems this is not such a simple thing can be resolved quickly. |
I still can't reproduce this and I'm not entirely sure what we need to add - we already do routers.RemoveInterface to remove both subnets, and this works for me locally both in the destroy code and on the CLI e.g http://paste.openstack.org/show/737479/ Can anyone provide a router show of a failing environment so I can see what's not getting removed, or share configuration and steps to reproduce? |
@wjiangjay can you help with any more information, e.g perhaps you can provide details of your manual cleanup steps, so I can see how they differ from my CLI steps ref http://paste.openstack.org/show/737479/ Without more information and/or a way to reproduce this, I'm not sure how to proceed, thanks for any further details you can provide! |
@hardys , sending you example log over email |
The example I got from @akostadinov was for AWS, this error is specific to OpenStack - perhaps @wjiangjay can help with more information? |
/close Closing as it seems the destroy command now works as expected. |
@mandre: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Version
Platform (aws|libvirt|openstack):
openstack
What happened?
I am trying to destroy a cluster to clean up the resources, but failed to clean and got the following:
What you expected to happen?
All resources should be cleaned without error
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
Enter text here.
References
The text was updated successfully, but these errors were encountered: