-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIAv2 async operations controller work #5849
BIAv2 async operations controller work #5849
Conversation
Currently marked as draft, awaiting two things:
EIther way, feel free to start reviewing the PR, as the code itself should be basically done except for any bugs I find during testing. |
Codecov Report
@@ Coverage Diff @@
## main #5849 +/- ##
==========================================
- Coverage 39.94% 39.73% -0.22%
==========================================
Files 253 255 +2
Lines 22259 23128 +869
==========================================
+ Hits 8892 9189 +297
- Misses 12716 13246 +530
- Partials 651 693 +42
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
0784837
to
72e3a06
Compare
e23d5a6
to
ae5bb59
Compare
ae5bb59
to
ecbafce
Compare
I've done a first pass, making the changes as suggested above:
What's not yet done is the controller work to process the finalize phase. That's next. |
upon initial investigation into the right way to handle the finalizing phase:
|
@sseago some comments for your this statement:
I don't have too much inclination as long as we can make a general backup finalizing phase and make the logic reasonable
I see the difficulty to update a tar in the backup storage if it already exists. The difficulty comes down to we've uploaded the same item twice, once in the backup controller when the async operations are launched, the other is when the backup comes to Merging with the general finalizing phase/handler concept, I suggest that we persist the overall backup data only once in the general finalizing handler. That is, we never persist the backup until the backup reaches to |
I'd rather not take this approach. This was actually what Dave had proposed in his original design. The problem is that it is less resilient to velero pod crash/reboot. This is already a limitation of the Velero controllers -- if velero restarts with an InProgress backup or restore, that backup/restore will fail and must be restarted. If we defer upload until all async actions are done and finalizing happens, then we extend this problem further. I know in some of our discussions around how to fix Velero's crash/restart problem in the future, one possibility discussed has been uploading backup content more frequently during the backup process so Velero can pick up with at least some done (that's a future feature, not here). But back to here -- with the current approach of uploading the backup and metadata upon completion of initial backup processing, if Velero crashes while the next backup is running and this backup is WaitingForPluginOperations, the only thing lost for this backup are the itemoperations updates that have not yet been written to object storage, so when velero comes back up, it can just query all plugins again for progress. I think the one final download before upload upon finalizing the backup is a small performance hit which gains resilience for us -- and it will only be required if there are plugin operations with updates during the async phase. |
ecbafce
to
77e445a
Compare
I've updated the PR to add the new finalizing phases and finalizing controller. The main change from previous PR discussion is rather than downloading and modifying the tarball during the finalize reconcile, I'm just creating a second tarball with only those resources indicated as needing update by the async backup item actions, so it really functions as an incremental backup. The restore workflow has been modified to download both tarballs (if present) and extract them in order, that way the updated resources overwrite the original copy prior to restore. Two additional notes:
|
77e445a
to
f36b407
Compare
Updated with the suggestions I'd made this morning. To make sure we don't get into a situation where we're updating resources in finalize that weren't in the original backup.tar.gz tarball, instead of returning an explicit list of items to update, it's been replaced with a boolean |
+1 @sseago the latest comment seems apt. @Lyndon-Li all the concerns should have been addressed with the latest changes in this PR, WDYT ? |
@sseago @shubham-pampattiwar Actually, the question is not about restricting updating items to the existing ones in the original backup.tar.gz at the finalizing phase, but about that we should not treat the SnapshotBackup CR (or similar objects created by plugin) as part of backup data. See my comments from here and downwards. |
Updated again with design doc and api doc updates. |
76a2eed
to
89ef7da
Compare
Oh, looks like I need to replace apimachinery/pkg/util/clock in this PR now. Already done for the rest of velero, but new files in my PR add this and that's causing failures. Will update. |
89ef7da
to
a4069fc
Compare
Clock replaced. |
a4069fc
to
86ce1c3
Compare
@@ -158,6 +163,12 @@ status: | |||
volumeSnapshotsAttempted: 2 | |||
# Number of volume snapshots that Velero successfully created for this backup. | |||
volumeSnapshotsCompleted: 1 | |||
# Number of attempted async BackupItemAction operations for this backup. | |||
asyncBackupIemOperationsAttempted: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -34,6 +34,7 @@ message ExecuteResponse { | |||
bytes item = 1; | |||
repeated generated.ResourceIdentifier additionalItems = 2; | |||
string operationID = 3; | |||
repeated generated.ResourceIdentifier itemsToUpdate = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DoesitemsToUpdate
reflect the meaning of the return value as described in the comment of .go
file:
a second slice of ResourceIdentifiers specifying related items which should be backed up after all asynchronous operations have completed.
But I don't have a good idea about naming here atm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thought was itemsToUpdateAfterOperations
but that felt too unwieldy. Another option would be itemsToFinalize
or postOperationItems
I'm fine with any of these, though -- lets discuss next week for a possible follow-on change. If we're going to change it, it needs to be before 1.11 is out. @shubham-pampattiwar any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sseago I would vote for postOperationItems
if we see them as a second list of items to backup.
There are some inconsistencies I mentioned in
#5849 (comment)
Maybe we should refine the flow here and treat the items in itemsToUpdate
or postOperationItems
differently, for example, we may define the flow like the backupper will only put these items into tarball as-is without any additional work, there is still inconsistency, but this inconsistency is easier to understand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt postOperationItems works for me. I'll update that in the follow-on PR. As for the inconsistencies between this and additional items, see comments below. I think we can converge things a bit more and minimize the inconsistency -- at this point, I think the only thing that must be different is that postOperationItems are not expected to create asynchronous operations, since that would potentially lead to never-ending backup.
"Name": "my-volume-vsc" | ||
}, | ||
"operationID": "<DataMoverBackup objectReference>", | ||
"itemsToUpdate": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixing now.
_WaitingForPluginOperationsPartiallyFailed_). In the event of any of the progress checks return an | ||
error, the phase will move to _WaitingForPluginOperationsPartiallyFailed_. The backup will then be | ||
requeued and will be rechecked again after some time has passed. | ||
_WaitingForPluginOperationsPartiallyFailed_), and the async backup operations controller will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reading the PR I find the terminology a little redundant, that sometimes it's called itemOperation
sometimes it's called pluginOperation
sometimes it's asyncPluginOperation
, if there are no particular reasons for such difference would it make the communication easier if we unify the terms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it now, I think the actual words used in the API leave off async (i.e. command flags, status fields) -- those tend to be very long already, so it makes them more unwieldy. We tend to add "async" in the text descriptions, documentation, etc. to make it clearer though. I could remove "async" from the other part of the messaging, but I actually think they help to make it clearer the way it is now. I don't have a strong opinion here, so we can deal with any changes in a follow-on PR next week. I think for the controller naming, we can leave that as-is either way, since there's nothing user-visible there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's trasnparent to the user, I'm just thinking making the naming more consistent and simplified would help the communication between developers, and make the code easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt if we need everything to either have "async" or not have it in the name, then I think we'd better remove it, since some of the names will be too long and unwieldy with them. I'm ok with removing Async where it appears, if that's ok with everyone. This would, for example, mean that asyncBackupOperationsReconciler
would become backupOperationsReconciler
.
206df46
to
087c43c
Compare
Signed-off-by: Scott Seago <[email protected]>
087c43c
to
c3d1d83
Compare
@reasonerjt I fixed the three easy suggestions and updated the PR. The two comments related to naming things can be handled in a follow-on PR if we decide a change is necessary. |
} | ||
// cancel operation if past timeout period | ||
if operation.Status.Created.Time.Add(backup.Spec.ItemOperationTimeout.Duration).Before(time.Now()) { | ||
_ = bia.Cancel(operation.Spec.OperationID, backup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the Cancel method is called, what is the expected return value when we call Progress
again?
We have the other place calling getBackupItemOperationProgress
in backup controller, so it is possible that Progress is called again after Cancel, is there any problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably depend on the plugin. But calling Progress after Cancel shouldn't crash anything. A plugin might put "Cancelled" in the progress description field, or it might just mark it as failed/errored.
@@ -51,6 +52,16 @@ type Request struct { | |||
PodVolumeBackups []*velerov1api.PodVolumeBackup | |||
BackedUpItems map[itemKey]struct{} | |||
CSISnapshots []snapshotv1api.VolumeSnapshot | |||
itemOperationsList *[]*itemoperation.BackupOperation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a strong reason to make it a pointer to a slice?
Why not make it
type Request struct {
ItemOperationList []*itemoperation.BackupOperation
}
So it's more consistent with other fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt The reason it's a pointer is that the item backupper grabs this and appends to it. We need the pointer so we're appending to the list that belongs to the request and not the just a copy in backupItem
return r.getItems(nil) | ||
} | ||
|
||
// getAllItems gets all relevant items from all API groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong comment
@@ -164,7 +181,7 @@ func getOrderedResourcesForType(orderedResources map[string]string, resourceType | |||
} | |||
|
|||
// getResourceItems collects all relevant items for a given group-version-resource. | |||
func (r *itemCollector) getResourceItems(log logrus.FieldLogger, gv schema.GroupVersion, resource metav1.APIResource) ([]*kubernetesResource, error) { | |||
func (r *itemCollector) getResourceItems(log logrus.FieldLogger, gv schema.GroupVersion, resource metav1.APIResource, resourceIDsMap map[schema.GroupResource][]velero.ResourceIdentifier) ([]*kubernetesResource, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this func becomes a little confusing b/c when resourceIDsMap
is provided the include/exclude rules in the backupRequest will be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue here is that for finalize, we don't need to deal with include/exclude since we have a limited list of items that we know we need to include, so instead of include/exclude rules plus listing items from cluster, we pull items from the map. We still need the rest of this logic since it does all of the GVR and preferred version negotiation which is necessary for getting items in their proper path for the tarball. But I can add clearer comments to the func to make this distinction clear. There are other places in velero code where we skip include/exclude rules under certain contexts.
@@ -148,6 +148,26 @@ func (b *backupSyncReconciler) Reconcile(ctx context.Context, req ctrl.Request) | |||
continue | |||
} | |||
|
|||
if backup.Status.Phase == velerov1api.BackupPhaseWaitingForPluginOperations || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not merge the if statements and you can print the backup.Status.Phase
in the log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do this in the follow-on PR.
delete(u.GetAnnotations(), mustIncludeAdditionalItemAnnotation) | ||
} | ||
obj = u | ||
if finalize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that means when calling BIAv2 plugin in the Finalization
phase, the async operation may still be triggered b/c we can't control that, and we choose not to track the progress of these operations, IMO this is a little problematic.
This probably will not happen currently, but we should at least document it in the design clearly.
Additionally, it may not be right to ignore the additional Items returned by the BIA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want to document for users that they should not be starting async actions if backup phase is finalize -- and I'll update my example plugin to check phase as an example. There's a real risk of an infinite loop scenario if we track async operations started by plugins during finalize, as the plugin would have to go back to waiting for plugin operations, then finalize again, where new async operations could start, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional items is less complicated. We can definitely process them here if we need to. In the original design, it was unnecessary since we were actually backing up these items on the first pass, then finalize was just backing them up again for updates, so the additional items were already included. But with the current implementation, these items are deferred until finalize, so they don't have the opportunity to add additional items anymore. We may want to add that back, although it will add some complications to processing. I'll consider this problem in the follow-on PR.
return false, itemFiles, err | ||
} | ||
|
||
if !finalize && groupResource == kuberesource.Pods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there will be problem if we backup data in the Finalization
phase?
So the Finalization
phase was introduced to handle the itemsToUpdate
list, and make sure they are backed up, I was thinking we should handle these items as how we did in the regular backup process.
We should avoid the case that the itemsToUpdate
are handled differently in many ways in terms of backup.
If there's a very strong reason for that, let's make sure we document the difference in the flow and the reason behind such difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my general comment at the end. I think I'm closer to agreement with you here now, at least for pods/pvcs. I think I will probably be able to remove this finalize special case.
c.metrics.RegisterBackupItemsErrorsGauge(backupScheduleName, backup.Status.Errors) | ||
// FIXME: download/upload results once https://github.com/vmware-tanzu/velero/pull/5576 is merged | ||
} | ||
removeIfComplete := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure why this is needed, is there any use case we need the data in the map after the backup is in the terminal state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we don't remove this if there are errors hit in encoding/uploading, or the backup patching fails. The thinking was that if we had errors like that, they could be transient so we should try again on the next iteration. That being said, we could simplify this by always removing on completion. That would mean that we may not completely recover from transient errors (i.e. object store may not get updated properly if there was a transient object store update error). I don't have a strong opinion here, though.
backup.Status.Phase == velerov1api.BackupPhaseFinalizingAfterPluginOperations || | ||
backup.Status.Phase == velerov1api.BackupPhaseFinalizingAfterPluginOperationsPartiallyFailed) { | ||
|
||
c.deleteOperationsForBackup(backup.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an OO perspective the delete and put should be funcs for BackupItemOperationsMap
opsLock sync.Mutex | ||
} | ||
|
||
// If backup has changes not yet uploaded, upload them now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should be like
// UpdateForBackup ...
And it should explain what this func is for.
BTW, the current comment // If backup has changes not yet uploaded, upload them now
seems should be at the place where this func is called...
logger logrus.FieldLogger | ||
clock clocks.WithTickerAndDelayedExecution | ||
frequency time.Duration | ||
itemOperationsMap *BackupItemOperationsMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the BackupItemOperationsMap
serves as a global map to store the operation list for each backup, and it's shared between asyncBackupOperationsReconciler
and downloadRequestRonconciler
Therefore, ideally the lifecycle of the map should be separated from this controller. It should be created in server.go
as a utility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt It's more limited than that. First, it's only a map for backups currently waiting for item operations. So 99% of what's done here is only by the asyncBackupOperationsReconciler
. The downloadRequestReconciler
only uses this to force a push to object store when getting a download request for a backup's item operations if they're ongoing. However, with our prior discussions here, we talked about how we want to change this for 1.12 to not use a download request here and possibly use a new CR for this. So it is primarily managed by the async reconciler -- but we still could split it out as a utility and pass it in to both controllers. I'll look into this with the follow-on PR. If splitting it out makes it easier to follow the code, I'll consider doing that. Functionally, the two would be equivalent, so the main question will be ease of understanding/maintenance of the code, so if spliting it out helps, then we should probably do that.
I have an overall comment regarding how to handle the I was thinking the second list of items returned by BIAv2 ("itemsToUpdate") should be treated as much equally as possible as the first list of additional Items. |
@sseago as we've discussed, I'm adding the approval to unblock the work. But there are a few comments I think we'll need to address before v1.11 release, feel free to ping me on slack. |
The thinking is that we make the backup as normal as possible. Thinking through the cases, we may be able to eliminate part of this, but here's the challenge -- there are good reasons for running normal BIA plugins, and there are good reasons for skipping additonal items/async processing. Lets consider the things being done:
To summarize the above, we really need the backup item workflow to allow basic BIA plugins to run here to ensure data consistency. I'm thinking that we may be able to just eliminate the "finalize==true" special cases around Hook processing and PV backup. As for additional items, maybe that too. For async operation processing, I'm thinking we leave the special case here but revisit post-1.11 if we identify a use case that needs it -- doing this will add a good bit of extra logic at this late stage, while the other changes are really just simplifications. In any case, lets consider the above and I'll add a follow-on PR for pv/hook/additional item changes if we agree to do that, and lets hold off on the other. I'll get to the other review comments later today. |
I prefer we make the operation processing only special-case, but in theory, it's hard to enforce that when the BIAv2 plugin is called against an item in finalize phase it does not trigger any async Ops, any good idea. |
@reasonerjt So I think we're at the point now where we can probably do basically all the same things for items in finalize as normal backup (removing most of the inconsistencies in the current PR). The only thing we won't do in finalize is create async operations, as described above. This leaves the question of how to handle it if a plugin goes against documented recommendations and starts an operation. Current code ignores operation ID, but I'm thinking instead we should log an error if this happens. That's the closest we can come to enforcing that it doesn't trigger them. If we create errors for this, then that will raise the visibility vs. just ignoring the created operation. |
@reasonerjt @sseago I am on board with most of the agreed upon changes regarding pv/hook/additional items, these changes can be done in a follow-up PR. |
@reasonerjt @shubham-pampattiwar I will be working on a follow-on PR to address several points from this review over the next couple days. |
Signed-off-by: Scott Seago [email protected]
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #5490, #5108
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.