Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: Unable to destroy if domain is not running and keep error loop. #656

Closed
praveenkumar opened this issue Nov 12, 2018 · 0 comments · Fixed by #660
Closed

libvirt: Unable to destroy if domain is not running and keep error loop. #656

praveenkumar opened this issue Nov 12, 2018 · 0 comments · Fixed by #660

Comments

@praveenkumar
Copy link
Contributor

Version

$ openshift-install version
bin/openshift-install v0.3.0-187-g09aa205d4df479476c06bc8d79e69091eaf7a13f
Terraform v0.11.10

Platform (aws|libvirt|openstack): libvirt

What happened?

 $ virsh -c $OPENSHIFT_INSTALL_LIBVIRT_URI list --all
 Id    Name                           State
----------------------------------------------------
 -     master0                        shut off
 -     test1-worker-0-zd7hd           shut off

$ bin/openshift-install destroy cluster --dir test --log-level debug
DEBUG Deleting libvirt volumes                     
DEBUG Deleting libvirt domains                     
DEBUG Deleting libvirt network                     
DEBUG Exiting deleting libvirt network             
DEBUG goroutine deleteNetwork complete             
ERROR Error destroying domain test1-worker-0-zd7hd: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running') 
DEBUG Exiting deleting libvirt domains             
DEBUG Exiting deleting libvirt volumes             
DEBUG goroutine deleteVolumes complete             
DEBUG Deleting libvirt domains                     
ERROR Error destroying domain test1-worker-0-zd7hd: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running') 
[...]

What you expected to happen?

It should delete or just error out without looping continue.

How to reproduce it (as minimally and precisely as possible)?

$ openshift-install create cluster
---- Manually shutdown the VM's ----
$ openshift-install destroy cluster
wking added a commit to wking/openshift-installer that referenced this issue Nov 13, 2018
And I've rerolled deletion to use a single call to each deleter,
failing fast if they error.  That should address cases where we cannot
destroy a shut-off domain [1]:

  $ virsh -c $OPENSHIFT_INSTALL_LIBVIRT_URI list --all
   Id    Name                           State
  ----------------------------------------------------
   -     master0                        shut off
   -     test1-worker-0-zd7hd           shut off

  $ bin/openshift-install destroy cluster --dir test --log-level debug
  DEBUG Deleting libvirt volumes
  DEBUG Deleting libvirt domains
  DEBUG Deleting libvirt network
  DEBUG Exiting deleting libvirt network
  DEBUG goroutine deleteNetwork complete
  ERROR Error destroying domain test1-worker-0-zd7hd: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')
  DEBUG Exiting deleting libvirt domains
  DEBUG Exiting deleting libvirt volumes
  DEBUG goroutine deleteVolumes complete
  DEBUG Deleting libvirt domains
  ERROR Error destroying domain test1-worker-0-zd7hd: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')
  [...]

Now we'll fail-fast in those cases, allowing the caller to clear the
stuck domains, after which they can restart deletion.

The previous goroutine approach was borrowed from the AWS destroyer.
But AWS has a large, complicated resource dependency graph which
includes cycles.  Libvirt is much simpler, with volumes and a network
that are all independent, followed by domains which depend on the
network and some of the volumes.  With this commit we now take a
single pass at destroying those resources starting at the leaf domains
and working our way rootwards.

I've retained some looping (although no longer in a separate
goroutine) for domain deletion.  This guards against racing domain
creation, as discussed in the new godocs for deleteDomains.

Also:

* Rename from libvirt_prefix_deprovision.go to libvirt.go.  The name
  is from 998ba30 (cmd,pkg/destroy: add non-terraform destroy,
  2018-09-25, openshift#324), but the implementation doesn't need to be
  represented in the filename.  This commit renames to libvirt.go to
  match the package name, since this file is the guts of this package.

* Simplify the AlwaysTrueFilter implementation.  No semantic changes,
  but this saves us a few lines of code.

* Add trailing periods for godocs to comply with [2].

[1]: openshift#656 (comment)
[2]: https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant