-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/azure wait for disk detach #248
Bugfix/azure wait for disk detach #248
Conversation
364ab2c
to
6aa7754
Compare
- Disks are now detached before deletion on Azure - Drain pod maximum grace period is aligned with drain timeout - Azure now longer poweroffs/shutdown VM before deletion
6aa7754
to
77072d2
Compare
@hardikdr /needs-review |
pkg/controller/machine.go
Outdated
@@ -470,7 +468,7 @@ func (c *controller) machineDelete(machine *v1alpha1.Machine, driver driver.Driv | |||
c.targetCoreClient, | |||
timeOutDuration, // TODO: Will need to configure timeout | |||
nodeName, | |||
-1, | |||
int(timeOutDuration.Seconds()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it for ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to cap the graceful termination of any pod by the drain timeout should help in overall drain option going through in the drain timeout. But i dunno if it works as expected.
I am hoping that if a pod is set to a graceful termination of 2Hours
, and our drain timeout is 5mins
we max would only give 5mins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted this change here - 7510f9c#diff-d8287fe74b5273163c9f6b6c635ad912R475.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
// There are disks attached hence need to detach them | ||
vm.StorageProfile.DataDisks = &[]compute.DataDisk{} | ||
|
||
_, errChan := vmClient.CreateOrUpdate(d.AzureMachineClass.Spec.ResourceGroup, machineID, vm, cancel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain how does it work?
Where are we making explicit detach calls to VM? , or does the empty-disk array instructs API to delete all disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes hardik. VM with an update call with empty data disks is how detachment of data disks is done in Azure. They lack a proper SDK call for it. Refer here - Azure/azure-sdk-for-go#1638 (comment)
adca4ce
to
85feb71
Compare
- Every pod termination now tries to be evicted for drain timeout period - If it fails to be evicted, it is deleted forcefully by setting the graceful period to 0s leading to a forceful deletion
- Drain is now invoked even in the case of forceful deletion (and) - After drain timeout duration of deletion call
85feb71
to
7510f9c
Compare
What this PR does / why we need it:
This PR attempts to fix the Azure issue of VMs stuck in deletion state.
Now pods are tried to be evicted in the timeout period, and if it fails the pods are forcefully deleted by setting the grace period to 0. The data disks are then detached and the VM is deleted.
Which issue(s) this PR fixes:
Fixes #242
Special notes for your reviewer:
Release note: