Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenStack] Destroy cluster can not work well since the resource dependency relationship #891

Closed
ghost opened this issue Dec 13, 2018 · 18 comments
Assignees

Comments

@ghost
Copy link

ghost commented Dec 13, 2018

Version

$ openshift-install version
openshift-install v0.5.0-master-103-g7980f7d750ebc4f74dd995055563b961f078caf5

Platform (aws|libvirt|openstack):

openstack

What happened?

I am trying to destroy a cluster to clean up the resources, but failed to clean and got the following:

bin/openshift-install destroy cluster --log-level debug --dir install-2018-12-13-10:10:15
DEBUG Deleting openstack networks
DEBUG Deleting openstack security-groups
DEBUG Deleting openstack ports   
DEBUG Deleting openstack containers
DEBUG Deleting openstack servers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack routers
DEBUG Deleting Subnet: 024c47a0-7e49-4187-bc02-8bb5a85c6c24
DEBUG Deleting network: 27d36a4a-3897-4eb1-9a74-12ba296709e0
DEBUG Exiting deleting openstack subnets                        
DEBUG Deleting Security Group: 848b5fb0-2134-4e74-b7a6-2cb81e174dd9
DEBUG Exiting deleting openstack ports
DEBUG goroutine deletePorts complete                      
DEBUG Deleting Security Group: ec199e44-6776-458b-acac-75c5a9789a12
DEBUG Exiting deleting openstack networks                   
DEBUG Deleting Router: 52093478-dcf1-4bcc-9a2c-dbb1e42da880    
DEBUG Exiting deleting openstack servers                                                                                                                                                                          
DEBUG goroutine deleteServers complete
DEBUG Deleting Security Group: fb4af047-126d-4081-ae94-96c60f1f73fb
FATAL Expected HTTP response code [202 204] when accessing [DELETE https://osp-xxxxx:13696/v2.0/routers/52093478-dcf1-4bcc-9a2c-dbb1e42da880], but got 409 instead
{"NeutronError": {"message": "Router 52093478-dcf1-4bcc-9a2c-dbb1e42da880 still has ports", "type": "RouterInUse", "detail": ""}}

What you expected to happen?

All resources should be cleaned without error

How to reproduce it (as minimally and precisely as possible)?

$ bin/openshift-install create cluster --log-level debug --dir=install-`date +%F-%T` 
$ bin/openshift-install destroy cluster --log-level debug --dir install-2018-12-13-10:10:15                                                                                                                                                               

Anything else we need to know?

Enter text here.

References

  • enter text here.
@akostadinov
Copy link

akostadinov commented Dec 13, 2018

For manual clean up this is what needs to be done (as far as I can see):

  1. delete VMs
  2. delete router interfaces
  3. delete router
  4. delete network ports
  5. delete network
  6. delete object store container files
  7. delete object store container

btw it will help if network names differ between clusters. It is hard to identify which network serves which cluster because all networks have the same name "openshift" and same subnets ip ranges.

@ghost
Copy link
Author

ghost commented Dec 13, 2018

After I clean all the interfaces in the router created by openshift-install, destroy cluster seems back to work.

installer git:(master) ✗ bin/openshift-install destroy cluster --log-level debug --dir install-2018-12-13-14:55:51                                                                                                                                                           
DEBUG Deleting openstack routers                                                                                                                                                                                                                                                
DEBUG Deleting openstack containers                                                                                                                                                                                                                                             
DEBUG Deleting openstack subnets                                                                                                                                                                                                                                                
DEBUG Deleting openstack networks                                                                                                                                                                                                                                               
DEBUG Deleting openstack security-groups                                                                                                                                                                                                                                        
DEBUG Deleting openstack servers                                                                                                                                                                                                                                                
DEBUG Deleting openstack ports                                                                                                                                                                                                                                                  
DEBUG Deleting Security Group: 984c789f-dd1f-4a85-91f4-45a623de1c80                                                                                                                                                                                                             
DEBUG Deleting Subnet: 14bbf8ea-5053-4716-b955-850c1b92eb48                                                                                                                                                                                                                     
DEBUG Exiting deleting openstack ports                                                                                                                                                                                                                                          
DEBUG goroutine deletePorts complete                                                                                                                                                                                                                                            
DEBUG Exiting deleting openstack servers                                                                                                                                                                                                                                        
DEBUG goroutine deleteServers complete                                                                                                                                                                                                                                          
DEBUG Deleting Router: 0acd3020-ce0e-4004-83db-db61cc463a28                                                                                                                                                                                                                     
DEBUG Deleting Security Group: f68a2f52-c61b-4193-bbee-2126d105db5d                                                                                                                                                                                                             
DEBUG Deleting network: 8b47d67d-0c32-438e-88c7-fc673c28f983                                                                                                                                                                                                                    
DEBUG Exiting deleting openstack security-groups                                                                                                                                                                                                                                
DEBUG Exiting deleting openstack routers                                                                                                                                                                                                                                        
DEBUG Deleting container: qe-wjiang.ocp.tt.testing                                                                                                                                                                                                                              
FATAL Resource not found                                                                                                                                                                                                                                                                                                                                                                                                                installer git:(master) ✗ bin/openshift-install destroy cluster --log-level debug --dir install-2018-12-13-14:55:51                                                                                                                                                           
DEBUG Deleting openstack servers
DEBUG Deleting openstack networks
DEBUG Deleting openstack ports
DEBUG Deleting openstack subnets
DEBUG Deleting openstack security-groups
DEBUG Deleting openstack containers
DEBUG Deleting openstack routers
DEBUG Exiting deleting openstack routers
DEBUG goroutine deleteRouters complete
DEBUG Exiting deleting openstack ports
DEBUG goroutine deletePorts complete
DEBUG Exiting deleting openstack servers
DEBUG goroutine deleteServers complete
DEBUG Exiting deleting openstack networks
DEBUG goroutine deleteNetworks complete
DEBUG Exiting deleting openstack subnets
DEBUG goroutine deleteSubnets complete
DEBUG Exiting deleting openstack security-groups
DEBUG goroutine deleteSecurityGroups complete
DEBUG Exiting deleting openstack containers
DEBUG goroutine deleteContainers complete
DEBUG Purging asset "Terraform Variables" from disk
DEBUG Purging asset "Kubeconfig Admin" from disk

@russellb
Copy link
Member

/assign @hardys

@hardys
Copy link

hardys commented Dec 14, 2018

Thanks for the report - this seems similar to #726 but it seems we need to soft-fail on the router deletion since the port is still in use, e.g we return instead of fail here:

https://github.com/openshift/installer/blob/master/pkg/destroy/openstack/openstack_deprovision.go#L339

I'll test a patch locally and push for test/review shortly

@hardys
Copy link

hardys commented Dec 14, 2018

For manual clean up this is what needs to be done (as far as I can see):
1. delete VMs
2. delete router interfaces
3. delete router
4. delete network ports
5. delete network
6. delete object store container files
7. delete object store container

Yeah we could serialize the deletion of resources - I took a similar approach to that used by awstagdeprovision for the AWS implementation, which tries to delete all the resources in parallel then soft-fails on things where dependency errors happen.

Ideally we'd just walk a dependency graph instead, but we don't have a complete view of all resources e.g created by terraform during bootstrapping then subsequently during the operator driven scale out.

btw it will help if network names differ between clusters. It is hard to identify which network serves which cluster because all networks have the same name "openshift" and same subnets ip ranges.

Could you raise a separate issue for this please, since I don't think it's related to destroy, and perhaps not even related to OpenStack?

@hardys
Copy link

hardys commented Dec 14, 2018

I couldn't reproduce this error locally but I pushed #911 which I think should fix it, @wjiangjay could you perhaps confirm this resolves your issue if you can reproduce?

hardys pushed a commit to hardys/installer that referenced this issue Dec 14, 2018
As reported in issue openshift#891 this can fail until assigned ports
are deleted, so return such that a deletion may be re-tried.

Closes: openshift#891
@ghost
Copy link
Author

ghost commented Dec 15, 2018

@hardys
Checked again with same OSP, but destroy cluster got looping at Deleteing Router and Subnets:

DEBUG Deleting Router: 38d33dab-afd7-4bd7-a1c5-284ded9aecfd                                                                                                                                                
DEBUG Exiting deleting openstack routers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting Subnet: 55a24691-c38a-4aec-9afd-fbdb33780986
DEBUG Exiting deleting openstack subnets
DEBUG Deleting network: 15917fb2-3618-4277-be16-8d5ce6ec8b3f
DEBUG Exiting deleting openstack networks
DEBUG Deleting openstack routers
DEBUG Deleting Router: 38d33dab-afd7-4bd7-a1c5-284ded9aecfd
DEBUG Exiting deleting openstack routers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting Subnet: 55a24691-c38a-4aec-9afd-fbdb33780986
DEBUG Exiting deleting openstack subnets
DEBUG Deleting network: 15917fb2-3618-4277-be16-8d5ce6ec8b3f
DEBUG Exiting deleting openstack networks
DEBUG Deleting openstack routers
DEBUG Deleting Router: 38d33dab-afd7-4bd7-a1c5-284ded9aecfd
DEBUG Exiting deleting openstack routers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting Subnet: 55a24691-c38a-4aec-9afd-fbdb33780986
DEBUG Exiting deleting openstack subnets
DEBUG Deleting network: 15917fb2-3618-4277-be16-8d5ce6ec8b3f
DEBUG Exiting deleting openstack networks
DEBUG Deleting openstack routers
DEBUG Deleting Router: 38d33dab-afd7-4bd7-a1c5-284ded9aecfd
DEBUG Exiting deleting openstack routers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting Subnet: 55a24691-c38a-4aec-9afd-fbdb33780986
DEBUG Exiting deleting openstack subnets
DEBUG Deleting network: 15917fb2-3618-4277-be16-8d5ce6ec8b3f
DEBUG Exiting deleting openstack networks
DEBUG Deleting openstack routers
DEBUG Deleting Router: 38d33dab-afd7-4bd7-a1c5-284ded9aecfd
DEBUG Exiting deleting openstack routers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting Subnet: 55a24691-c38a-4aec-9afd-fbdb33780986
DEBUG Exiting deleting openstack subnets
DEBUG Deleting network: 15917fb2-3618-4277-be16-8d5ce6ec8b3f
DEBUG Exiting deleting openstack networks
DEBUG Deleting openstack routers
DEBUG Deleting Router: 38d33dab-afd7-4bd7-a1c5-284ded9aecfd
DEBUG Exiting deleting openstack routers
DEBUG Deleting openstack subnets
DEBUG Deleting openstack networks
DEBUG Deleting Subnet: 55a24691-c38a-4aec-9afd-fbdb33780986
DEBUG Exiting deleting openstack subnets
DEBUG Deleting network: 15917fb2-3618-4277-be16-8d5ce6ec8b

And as I said before, the router can not be delete since still have some interfaces on it.

@hardys
Copy link

hardys commented Dec 15, 2018

@wjiangjay ack thanks, apologies seems I misunderstood the issue - I'll push a follow-up patch to remove the interfaces - can you please re-open this issue or raise another one so we can track the fix?

@russellb
Copy link
Member

/reopen

@openshift-ci-robot
Copy link
Contributor

@russellb: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hardys pushed a commit to hardys/installer that referenced this issue Dec 15, 2018
As reported in openshift#891 this is not the right fix and can result in a
loop if the router delete fails.  Instead we need to remove interfaces
from the router before attempting deletion.  For now revert the
incorrect fix and I'll push a follow-up with revised solution
pending further local testing.

Related: openshift#891

This reverts commit fb73b45.
@hardys
Copy link

hardys commented Dec 15, 2018

Ok I pushed #920 to revert the incorrect fix and will follow-up on Monday with a revised solution

@ghost
Copy link
Author

ghost commented Dec 16, 2018

@hardys Thanks for this, and seems this is not such a simple thing can be resolved quickly.

@hardys
Copy link

hardys commented Dec 17, 2018

I still can't reproduce this and I'm not entirely sure what we need to add - we already do routers.RemoveInterface to remove both subnets, and this works for me locally both in the destroy code and on the CLI e.g http://paste.openstack.org/show/737479/

Can anyone provide a router show of a failing environment so I can see what's not getting removed, or share configuration and steps to reproduce?

@hardys
Copy link

hardys commented Jan 7, 2019

@wjiangjay can you help with any more information, e.g perhaps you can provide details of your manual cleanup steps, so I can see how they differ from my CLI steps ref http://paste.openstack.org/show/737479/

Without more information and/or a way to reproduce this, I'm not sure how to proceed, thanks for any further details you can provide!

@akostadinov
Copy link

@hardys , sending you example log over email

@hardys
Copy link

hardys commented Jan 16, 2019

The example I got from @akostadinov was for AWS, this error is specific to OpenStack - perhaps @wjiangjay can help with more information?

@mandre
Copy link
Member

mandre commented Sep 10, 2019

/close

Closing as it seems the destroy command now works as expected.

@openshift-ci-robot
Copy link
Contributor

@mandre: Closing this issue.

In response to this:

/close

Closing as it seems the destroy command now works as expected.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants