-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 6913: Velero Built-in Datamover: Backup stucks in phase WaitingForPluginOperations when Node Agent pod gets restarted #6914
Conversation
87a8a6b
to
4e203b2
Compare
Codecov Report
@@ Coverage Diff @@
## main #6914 +/- ##
==========================================
+ Coverage 60.78% 61.04% +0.25%
==========================================
Files 250 252 +2
Lines 26629 26855 +226
==========================================
+ Hits 16187 16394 +207
- Misses 9293 9306 +13
- Partials 1149 1155 +6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be useful to future data movers to keep Canceling
phase
4e203b2
to
4b54c89
Compare
@kaovilai The DU/DD controllers don't do anything with this phase, so I'm not sure what value there is in defining it as valid. |
original := du.DeepCopy() | ||
du.Status.Phase = velerov2alpha1api.DataUploadPhaseCanceling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two cases:
- If the dataPath is still there (
fsBackup
is not nil), we cannot update the DUCR to cancelled directly. Instead, we need to notify the dataPath (by callingfsBackup.Cancel()
), wait for it to cancel the data transfer and then it will call backOnDataUploadCancelled
- If the dataPath is not there, we need to cancel the DUCR in the main flow
Both of the cases must be supported. However:
- For the existing code, case 2 is not supported
- For the changes in this PR, case 1 will not be supported
So please reconstruct the code to support the both case
original := du.DeepCopy() | ||
du.Status.Phase = velerov2alpha1api.DataUploadPhaseCanceling | ||
du.Status.Phase = velerov2alpha1api.DataUploadPhaseCanceled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For case 2, instead of patching the DUCR directly here, let's call OnDataUploadCancelled
instead, it does something more:
- Update the completionTimeStamp
- Cleanup intermediate object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar I think the updated version still doesn't handle case 2.
When there is no fsbackup, then we set Canceled directly, but when there is, we need to cancel the fsbackup rather than just setting the phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar
For 1, we need to call fsBackup.Cancel()
and NOT set du.Status.Phase
(the same as the original code).
For 2, we just need to call OnDataUploadCancelled
Please also rebase the code, the codespell error will disappear. |
@shubham-pampattiwar As discussed, please also help to take care of the DataDownload part. |
4b54c89
to
6f9ce0b
Compare
@Lyndon-Li @sseago Added fix for data download controller too, PTAL, Thanks ! |
6f9ce0b
to
8b91a91
Compare
@Lyndon-Li Updated the PR, Please take another look. |
return ctrl.Result{}, nil | ||
} | ||
|
||
log.Info("Data download is being canceled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to line 290, to be similar to Data upload case?
original := dd.DeepCopy() | ||
dd.Status.Phase = velerov2alpha1api.DataDownloadPhaseCanceling | ||
dd.Status.Phase = velerov2alpha1api.DataDownloadPhaseCanceled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before fsRestore
cancels itself, we should not set DD to DataDownloadPhaseCanceled
. We need to wait fsRestore
to cancel itself and then it calls OnDataDownloadCancelled
.
So just rollback to the original code for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, Updated the PR.
@@ -289,18 +289,17 @@ func (r *DataDownloadReconciler) Reconcile(ctx context.Context, req ctrl.Request | |||
if dd.Spec.Cancel { | |||
fsRestore := r.dataPathMgr.GetAsyncBR(dd.Name) | |||
if fsRestore == nil { | |||
r.OnDataDownloadCancelled(ctx, dd.GetNamespace(), dd.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a log here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-positioned the earlier log
original := du.DeepCopy() | ||
du.Status.Phase = velerov2alpha1api.DataUploadPhaseCanceling | ||
du.Status.Phase = velerov2alpha1api.DataUploadPhaseCanceled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as DD, rollback to the original code and wait fsBackup
to cancel itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Shubham Pampattiwar <[email protected]> add changelog file Signed-off-by: Shubham Pampattiwar <[email protected]> keep canceling phase const Signed-off-by: Shubham Pampattiwar <[email protected]> fix data download as well Signed-off-by: Shubham Pampattiwar <[email protected]> address PR feedback Signed-off-by: Shubham Pampattiwar <[email protected]> minor fixes Signed-off-by: Shubham Pampattiwar <[email protected]>
8b91a91
to
ee271b7
Compare
Thank you for contributing to Velero!
Please add a summary of your change
Adds changes proposed here: #6913 (comment)
Does your change fix a particular issue?
Fixes #6913
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.