From a4069fc7b95b8e24224cd74b1282e93d16466b93 Mon Sep 17 00:00:00 2001 From: Scott Seago Date: Thu, 23 Feb 2023 09:58:58 -0500 Subject: [PATCH] bugfixes for BIAv2 controller work Signed-off-by: Scott Seago --- design/biav2-design.md | 14 ++ design/general-progress-monitoring.md | 209 +++++++++++------- .../async_backup_operations_controller.go | 20 +- ...async_backup_operations_controller_test.go | 6 +- pkg/controller/backup_controller.go | 8 +- pkg/controller/backup_controller_test.go | 110 +++++---- pkg/controller/backup_finalizer_controller.go | 6 +- .../backup_finalizer_controller_test.go | 6 +- pkg/controller/backup_sync_controller_test.go | 4 +- .../backupitemaction/v2/backup_item_action.go | 5 +- site/content/docs/main/api-types/backup.md | 13 +- 11 files changed, 230 insertions(+), 171 deletions(-) diff --git a/design/biav2-design.md b/design/biav2-design.md index a638b6b14df..ae178479fde 100644 --- a/design/biav2-design.md +++ b/design/biav2-design.md @@ -34,6 +34,7 @@ message ExecuteResponse { bytes item = 1; repeated generated.ResourceIdentifier additionalItems = 2; string operationID = 3; + repeated generated.ResourceIdentifier itemsToUpdate = 4; } ``` The BackupItemAction service gets two new rpc methods: @@ -78,6 +79,19 @@ message OperationProgress { } ``` +In addition to the two new rpc methods added to the BackupItemAction interface, there is also a new `Name()` method. This one is only actually used internally by Velero to get the name that the plugin was registered with, but it still must be defined in a plugin which implements BackupItemActionV2 in order to implement the interface. It doesn't really matter what it returns, though, as this particular method is not delegated to the plugin via RPC calls. The new (and modified) interface methods for `BackupItemAction` are as follows: +``` +type BackupItemAction interface { +... + Name() string +... + Execute(item runtime.Unstructured, backup *api.Backup) (runtime.Unstructured, []velero.ResourceIdentifier, string, []velero.ResourceIdentifier, error) + Progress(operationID string, backup *api.Backup) (velero.OperationProgress, error) + Cancel(operationID string, backup *api.Backup) error +... +} +``` + A new PluginKind, `BackupItemActionV2`, will be created, and the backup process will be modified to use this plugin kind. See [Plugin Versioning](plugin-versioning.md) for more details on implementation plans, including v1 adapters, etc. diff --git a/design/general-progress-monitoring.md b/design/general-progress-monitoring.md index cb627d822c2..80e7caa3893 100644 --- a/design/general-progress-monitoring.md +++ b/design/general-progress-monitoring.md @@ -113,13 +113,21 @@ long as they don't use not-yet-completed backups) to be made without interferenc all data has been moved before starting the next backup will slow the progress of the system without adding any actual benefit to the user. -A new backup/restore phase, "WaitingForPluginOperations" will be introduced. When a backup or -restore has entered this phase, Velero is free to start another backup/restore. The backup/restore +New backup/restore phases, "WaitingForPluginOperations" and +"WaitingForPluginOperationsPartiallyFailed" will be introduced. When a backup or restore has +entered one of these phases, Velero is free to start another backup/restore. The backup/restore will remain in the "WaitingForPluginOperations" phase until all BIA/RIA operations have completed (for example, for a volume snapshotter, until all data has been successfully moved to persistent -storage). The backup/restore will not fail once it reaches this phase. If the backup is deleted -(cancelled), the plug-ins will attempt to delete the snapshots and stop the data movement - this may -not be possible with all storage systems. +storage). The backup/restore will not fail once it reaches this phase, although an error return +from a plugin could cause a backup or restore to move to "PartiallyFailed". If the backup is +deleted (cancelled), the plug-ins will attempt to delete the snapshots and stop the data movement - +this may not be possible with all storage systems. + +In addition, for backups (but not restores), there will also be two additional phases, +"FinalizingAfterPluginOperations" and "FinalizingAfterPluginOperationsPartiallyFailed", which will +handle any steps required after plugin operations have all completed. Initially, this will just +include adding any required resources to the backup that might have changed during asynchronous +operation execution, although eventually other cleanup actions could be added to this phase. ### State progression @@ -143,7 +151,14 @@ In the current implementation, Restic backups will move data during the "InProgr future, it may be possible to combine a snapshot with a Restic (or equivalent) backup which would allow for data movement to be handled in the "WaitingForPluginOperations" phase, -The next phase is either "Completed", "WaitingForPluginOperations", "Failed" or "PartiallyFailed". +The next phase would be "WaitingForPluginOperations" for backups or restores which have unfinished +asynchronous plugin operations and no errors so far, "WaitingForPluginOperationsPartiallyFailed" for +backups or restores which have unfinished asynchronous plugin operations at least one error, +"Completed" for restores with no unfinished asynchronous plugin operations and no errors, +"PartiallyFailed" for restores with no unfinished asynchronous plugin operations and at least one +error, "FinalizingAfterPluginOperations" for backups with no unfinished asynchronous plugin +operations and no errors, "FinalizingAfterPluginOperationsPartiallyFailed" for backups with no +unfinished asynchronous plugin operations and at least one error, or "PartiallyFailed". Backups/restores which would have a final phase of "Completed" or "PartiallyFailed" may move to the "WaitingForPluginOperations" or "WaitingForPluginOperationsPartiallyFailed" state. A backup/restore which will be marked "Failed" will go directly to the "Failed" phase. Uploads may continue in the @@ -157,8 +172,9 @@ any uploads still in progress should be aborted. The "WaitingForPluginOperations" phase signifies that the main part of the backup/restore, including snapshotting has completed successfully and uploading and any other asynchronous BIA/RIA plugin operations are continuing. In the event of an error during this phase, the phase will change to -WaitingForPluginOperationsPartiallyFailed. On success, the phase changes to Completed. Backups -cannot be restored from when they are in the WaitingForPluginOperations state. +WaitingForPluginOperationsPartiallyFailed. On success, the phase changes to +"FinalizingAfterPluginOperations" for backups and "Completed" for restores. Backups cannot be +restored from when they are in the WaitingForPluginOperations state. ### WaitingForPluginOperationsPartiallyFailed (new) The "WaitingForPluginOperationsPartiallyFailed" phase signifies that the main part of the @@ -166,6 +182,22 @@ backup/restore, including snapshotting has completed, but there were partial fai the main part or during any async operations, including snapshot uploads. Backups cannot be restored from when they are in the WaitingForPluginOperationsPartiallyFailed state. +### FinalizingAfterPluginOperations (new) +The "FinalizingAfterPluginOperations" phase signifies that asynchronous backup operations have all +completed successfully and Velero is currently backing up any resources indicated by asynchronous +plugins as items to back up after operations complete. Once this is done, the phase changes to +Completed. Backups cannot be restored from when they are in the FinalizingAfterPluginOperations +state. + +### FinalizingAfterPluginOperationsPartiallyFailed (new) + +The "FinalizingAfterPluginOperationsPartiallyFailed" phase signifies that, for a backup which had +errors during initial processing or asynchronous plugin operation, asynchronous backup operations +have all completed and Velero is currently backing up any resources indicated by asynchronous +plugins as items to back up after operations complete. Once this is done, the phase changes to +PartiallyFailed. Backups cannot be restored from when they are in the +FinalizingAfterPluginOperationsPartiallyFailed state. + ### Failed When a backup/restore has had fatal errors it is marked as "Failed" Backups in this state cannot be restored from. @@ -211,12 +243,21 @@ WaitingForPluginOperationsPartiallyFailed phase, another backup/restore may be s While in the WaitingForPluginOperations or WaitingForPluginOperationsPartiallyFailed phase, the snapshots and item actions will be periodically polled. When all of the snapshots and item actions -have reported success, the backup/restore will move to the Completed or PartiallyFailed phase, -depending on whether the backup/restore was in the WaitingForPluginOperations or -WaitingForPluginOperationsPartiallyFailed phase. - -The Backup resources will not be written to object storage until the backup has entered a final phase: -Completed, Failed or PartiallyFailed +have reported success, restores will move directly to the Completed or PartiallyFailed phase, and +backups will move to the FinalizingAfterPluginOperations or +FinalizingAfterPluginOperationsPartiallyFailed phase, depending on whether the backup/restore was in +the WaitingForPluginOperations or WaitingForPluginOperationsPartiallyFailed phase. + +While in the FinalizingAfterPluginOperations or FinalizingAfterPluginOperationsPartiallyFailed +phase, Velero will update the backup with any resources indicated by plugins that they must be added +to the backup after operations are completed, and then the backup will move to the Completed or +PartiallyFailed phase, depending on whether there are any backup errors. + +The Backup resources will be written to object storage at the time the backup leaves the InProgress +phase, but it will not be synced to other clusters (or usable for restores in the current cluster) +until the backup has entered a final phase: Completed, Failed or PartiallyFailed. During the +Finalizing phases, a the backup resources will be updated with any required resources related to +asynchronous plugins. ## Reconciliation of InProgress backups @@ -249,8 +290,6 @@ Two new methods will be added to the VolumeSnapshotter interface: Progress(snapshotID string) (OperationProgress, error) Cancel(snapshotID string) (error) -Open question: Does VolumeSnapshotter need Cancel, or is that only needed for BIA/RIA? - Progress will report the current status of a snapshot upload. This should be callable at any time after the snapshot has been taken. In the event a plug-in is restarted, if the operationID (snapshot ID) continues to be valid it should be possible to retrieve the progress. @@ -281,35 +320,38 @@ progress, Progress will return an InvalidOperationIDError error rather than a po OperationProgress struct. If the item action does not start an asynchronous operation, then operationID will be empty. -Two new methods will be added to the BackupItemAction interface, and the Execute() return signature +Three new methods will be added to the BackupItemAction interface, and the Execute() return signature will be modified: - // Execute allows the ItemAction to perform arbitrary logic with the item being backed up, - // including mutating the item itself prior to backup. The item (unmodified or modified) - // should be returned, an optional operationID, along with an optional slice of ResourceIdentifiers - // specifying additional related items that should be backed up. If operationID is specified - // then velero will wait for this operation to complete before the backup is marked Completed. - Execute(item runtime.Unstructured, backup *api.Backup) (runtime.Unstructured, operationID string, - []ResourceIdentifier, error) + // Name returns the name of this BIA. Plugins which implement this interface must defined Name, + // but its content is unimportant, as it won't actually be called via RPC. Velero's plugin infrastructure + // will implement this directly rather than delegating to the RPC plugin in order to return the name + // that the plugin was registered under. The plugins must implement the method to complete the interface. + Name() string + // Execute allows the BackupItemAction to perform arbitrary logic with the item being backed up, + // including mutating the item itself prior to backup. The item (unmodified or modified) + // should be returned, along with an optional slice of ResourceIdentifiers specifying + // additional related items that should be backed up now, an optional operationID for actions which + // initiate asynchronous actions, and a second slice of ResourceIdentifiers specifying related items + // which should be backed up after all asynchronous operations have completed. This last field will be + // ignored if operationID is empty, and should not be filled in unless the resource must be updated in the + // backup after async operations complete (i.e. some of the item's kubernetes metadata will be updated + // during the asynch operation which will be required during restore) + Execute(item runtime.Unstructured, backup *api.Backup) (runtime.Unstructured, []velero.ResourceIdentifier, string, []velero.ResourceIdentifier, error) // Progress - Progress(input *BackupItemActionProgressInput) (OperationProgress, error) + Progress(operationID string, backup *api.Backup) (velero.OperationProgress, error) // Cancel - Cancel(input *BackupItemActionProgressInput) error - - // BackupItemActionProgressInput contains the input parameters for the BackupItemAction's Progress function. - type BackupItemActionProgressInput struct { - // Item is the item that was stored in the backup - Item runtime.Unstructured - // OperationID is the operation ID returned by BackupItemAction Execute - operationID string - // Backup is the representation of the backup resource processed by Velero. - Backup *velerov1api.Backup - } + Cancel(operationID string, backup *api.Backup) error -Two new methods will be added to the RestoreItemAction interface, and the +Three new methods will be added to the RestoreItemAction interface, and the RestoreItemActionExecuteOutput struct will be modified: + // Name returns the name of this RIA. Plugins which implement this interface must defined Name, + // but its content is unimportant, as it won't actually be called via RPC. Velero's plugin infrastructure + // will implement this directly rather than delegating to the RPC plugin in order to return the name + // that the plugin was registered under. The plugins must implement the method to complete the interface. + Name() string // Execute allows the ItemAction to perform arbitrary logic with the item being restored, // including mutating the item itself prior to restore. The item (unmodified or modified) // should be returned, an optional OperationID, along with an optional slice of ResourceIdentifiers @@ -321,10 +363,10 @@ RestoreItemActionExecuteOutput struct will be modified: // Progress - Progress(input *RestoreItemActionProgressInput) (OperationProgress, error) + Progress(operationID string, restore *api.Restore) (velero.OperationProgress, error) // Cancel - Cancel(input *RestoreItemActionProgressInput) error + Cancel(operationID string, restore *api.Restore) error // RestoreItemActionExecuteOutput contains the output variables for the ItemAction's Execution function. type RestoreItemActionExecuteOutput struct { @@ -345,16 +387,6 @@ RestoreItemActionExecuteOutput struct will be modified: OperationID string } - // RestoreItemActionProgressInput contains the input parameters for the RestoreItemAction's Progress function. - type RestoreItemActionProgressInput struct { - // Item is the item that was stored in the restore - Item runtime.Unstructured - // OperationID is the operation ID returned by RestoreItemAction Execute - operationID string - // Restore is the representation of the restore resource processed by Velero. - Restore *velerov1api.Restore - } - ## Changes in Velero backup format No changes to the existing format are introduced by this change. As part of the backup workflow changes, a @@ -370,19 +402,35 @@ to select the appropriate Backup/RestoreItemAction plugin to query for progress. of what a record for a datamover plugin might look like: ``` { - "itemOperation": { - "plugin": "velero.io/datamover-backup", - "itemID": "", - "operationID": "", - "completed": true, - "err": "", - "NCompleted": 12345, - "NTotal": 12345, - "OperationUnits": "byte", - "Description": "", - "Started": "2022-12-14T12:01:00Z", - "Updated": "2022-12-14T12:11:02Z" - } + "spec": { + "backupName": "backup-1", + "backupUID": "f8c72709-0f73-46e1-a071-116bc4a76b07", + "backupItemAction": "velero.io/volumesnapshotcontent-backup", + "resourceIdentifier": { + "Group": "snapshot.storage.k8s.io", + "Resource": "VolumeSnapshotContent", + "Namespace": "my-app", + "Name": "my-volume-vsc" + }, + "operationID": "", + "itemsToUpdate": { + "Group": "velero.io", + "Resource": "VolumeSnapshotBackup", + "Namespace": "my-app", + "Name": "vsb-1" + }, + }, + "status": { + "operationPhase": "Completed", + "error": "", + "nCompleted": 12345, + "nTotal": 12345, + "operationUnits": "byte", + "description": "", + "Created": "2022-12-14T12:00:00Z", + "Started": "2022-12-14T12:01:00Z", + "Updated": "2022-12-14T12:11:02Z" + }, } ``` @@ -425,36 +473,41 @@ progress. ## Backup workflow changes The backup workflow remains the same until we get to the point where the `velero-backup.json` object -is written. At this point, we will queue the backup to a finalization go-routine. The next backup -may then begin. The finalization routine will run across all of the -VolumeSnapshotter/BackupItemAction operations and call the _Progress_ method on each of them. +is written. At this point, Velero will +run across all of the VolumeSnapshotter/BackupItemAction operations and call the _Progress_ method +on each of them. -If all snapshot and backup item operations have finished (either successfully or failed), the backup -will be completed and the backup will move to the appropriate terminal phase and upload the -`velero-backup.json` object to the object store and the backup will be complete. +If all backup item operations have finished (either successfully or failed), the backup will move to +one of the finalize phases. If any of the snapshots or backup items are still being processed, the phase of the backup will be set to the appropriate phase (_WaitingForPluginOperations_ or -_WaitingForPluginOperationsPartiallyFailed_). In the event of any of the progress checks return an -error, the phase will move to _WaitingForPluginOperationsPartiallyFailed_. The backup will then be -requeued and will be rechecked again after some time has passed. +_WaitingForPluginOperationsPartiallyFailed_), and the async backup operations controller will +reconcile periodically and call Progress on any unfinished operations. In the event of any of the +progress checks return an error, the phase will move to _WaitingForPluginOperationsPartiallyFailed_. + +Once all operations have completed, the backup will be moved to one of the finalize phases, and the +backup finalizer controller will update the the `velero-backup.json`in the object store with any +resources necessary after asynchronous operations are complete and the backup will move to the +appropriate terminal phase. + ## Restore workflow changes The restore workflow remains the same until velero would currently move the backup into one of the -terminal states. At this point, we will queue the restore to a finalization go-routine. The next -restore may then begin. The finalization routine will run across all of the RestoreItemAction -operations and call the _Progress_ method on each of them. +terminal states. At this point, Velero will run across all of the RestoreItemAction operations and +call the _Progress_ method on each of them. If all restore item operations have finished (either successfully or failed), the restore will be completed and the restore will move to the appropriate terminal phase and the restore will be complete. If any of the restore items are still being processed, the phase of the restore will be set to the -appropriate phase (_WaitingForPluginOperations_ or _WaitingForPluginOperationsPartiallyFailed_). In -the event of any of the progress checks return an error, the phase will move to -_WaitingForPluginOperationsPartiallyFailed_. The restore will then be requeued and will be rechecked -again after some time has passed. +appropriate phase (_WaitingForPluginOperations_ or _WaitingForPluginOperationsPartiallyFailed_), and +the async restore operations controller will reconcile periodically and call Progress on any +unfinished operations. In the event of any of the progress checks return an error, the phase will +move to _WaitingForPluginOperationsPartiallyFailed_. Once all of the operations have completed, the +restore will be moved to the appropriate terminal phase. ## Restart workflow diff --git a/pkg/controller/async_backup_operations_controller.go b/pkg/controller/async_backup_operations_controller.go index 4497f42722f..04d24f8f9ea 100644 --- a/pkg/controller/async_backup_operations_controller.go +++ b/pkg/controller/async_backup_operations_controller.go @@ -26,12 +26,10 @@ import ( "github.com/sirupsen/logrus" apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/util/clock" + clocks "k8s.io/utils/clock" ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/builder" "sigs.k8s.io/controller-runtime/pkg/client" - "sigs.k8s.io/controller-runtime/pkg/event" - "sigs.k8s.io/controller-runtime/pkg/predicate" velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1" "github.com/vmware-tanzu/velero/pkg/itemoperation" @@ -133,7 +131,7 @@ func (m *BackupItemOperationsMap) UpdateForBackup(backupStore persistence.Backup type asyncBackupOperationsReconciler struct { client.Client logger logrus.FieldLogger - clock clock.Clock + clock clocks.WithTickerAndDelayedExecution frequency time.Duration itemOperationsMap *BackupItemOperationsMap newPluginManager func(logger logrus.FieldLogger) clientmgmt.Manager @@ -152,7 +150,7 @@ func NewAsyncBackupOperationsReconciler( abor := &asyncBackupOperationsReconciler{ Client: client, logger: logger, - clock: clock.RealClock{}, + clock: clocks.RealClock{}, frequency: frequency, itemOperationsMap: &BackupItemOperationsMap{operations: make(map[string]*operationsForBackup)}, newPluginManager: newPluginManager, @@ -168,17 +166,7 @@ func NewAsyncBackupOperationsReconciler( func (c *asyncBackupOperationsReconciler) SetupWithManager(mgr ctrl.Manager) error { s := kube.NewPeriodicalEnqueueSource(c.logger, mgr.GetClient(), &velerov1api.BackupList{}, c.frequency, kube.PeriodicalEnqueueSourceOption{}) return ctrl.NewControllerManagedBy(mgr). - For(&velerov1api.Backup{}, builder.WithPredicates(predicate.Funcs{ - UpdateFunc: func(ue event.UpdateEvent) bool { - return false - }, - DeleteFunc: func(de event.DeleteEvent) bool { - return false - }, - GenericFunc: func(ge event.GenericEvent) bool { - return false - }, - })). + For(&velerov1api.Backup{}, builder.WithPredicates(kube.FalsePredicate{})). Watches(s, nil). Complete(c) } diff --git a/pkg/controller/async_backup_operations_controller_test.go b/pkg/controller/async_backup_operations_controller_test.go index b388ccf0b72..935f92ac47c 100644 --- a/pkg/controller/async_backup_operations_controller_test.go +++ b/pkg/controller/async_backup_operations_controller_test.go @@ -28,7 +28,7 @@ import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/types" - "k8s.io/apimachinery/pkg/util/clock" + testclocks "k8s.io/utils/clock/testing" ctrl "sigs.k8s.io/controller-runtime" kbclient "sigs.k8s.io/controller-runtime/pkg/client" @@ -51,7 +51,7 @@ var ( bia = &biav2mocks.BackupItemAction{} ) -func mockAsyncBackupOperationsReconciler(fakeClient kbclient.Client, fakeClock *clock.FakeClock, freq time.Duration) (*asyncBackupOperationsReconciler, *BackupItemOperationsMap) { +func mockAsyncBackupOperationsReconciler(fakeClient kbclient.Client, fakeClock *testclocks.FakeClock, freq time.Duration) (*asyncBackupOperationsReconciler, *BackupItemOperationsMap) { abor, biaMap := NewAsyncBackupOperationsReconciler( logrus.StandardLogger(), fakeClient, @@ -65,7 +65,7 @@ func mockAsyncBackupOperationsReconciler(fakeClient kbclient.Client, fakeClock * } func TestAsyncBackupOperationsReconcile(t *testing.T) { - fakeClock := clock.NewFakeClock(time.Now()) + fakeClock := testclocks.NewFakeClock(time.Now()) metav1Now := metav1.NewTime(fakeClock.Now()) defaultBackupLocation := builder.ForBackupStorageLocation(velerov1api.DefaultNamespace, "default").Result() diff --git a/pkg/controller/backup_controller.go b/pkg/controller/backup_controller.go index c16c102d0cb..cc0e0345217 100644 --- a/pkg/controller/backup_controller.go +++ b/pkg/controller/backup_controller.go @@ -766,18 +766,14 @@ func (c *backupController) runBackup(backup *pkgbackup.Request) error { case logCounter.GetCount(logrus.ErrorLevel) > 0: if inProgressOperations { backup.Status.Phase = velerov1api.BackupPhaseWaitingForPluginOperationsPartiallyFailed - } else if backup.Status.AsyncBackupItemOperationsAttempted > 0 { - backup.Status.Phase = velerov1api.BackupPhaseFinalizingAfterPluginOperationsPartiallyFailed } else { - backup.Status.Phase = velerov1api.BackupPhasePartiallyFailed + backup.Status.Phase = velerov1api.BackupPhaseFinalizingAfterPluginOperationsPartiallyFailed } default: if inProgressOperations { backup.Status.Phase = velerov1api.BackupPhaseWaitingForPluginOperations - } else if backup.Status.AsyncBackupItemOperationsAttempted > 0 { - backup.Status.Phase = velerov1api.BackupPhaseFinalizingAfterPluginOperations } else { - backup.Status.Phase = velerov1api.BackupPhaseCompleted + backup.Status.Phase = velerov1api.BackupPhaseFinalizingAfterPluginOperations } } // Mark completion timestamp before serializing and uploading. diff --git a/pkg/controller/backup_controller_test.go b/pkg/controller/backup_controller_test.go index fb66aab2683..9c90da37e8d 100644 --- a/pkg/controller/backup_controller_test.go +++ b/pkg/controller/backup_controller_test.go @@ -605,7 +605,7 @@ func TestProcessBackupCompletions(t *testing.T) { backupExists bool existenceCheckError error }{ - // Completed + // FinalizingAfterPluginOperations { name: "backup with no backup location gets the default", backup: defaultBackup().Result(), @@ -633,12 +633,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.True(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -669,12 +668,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.False(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -708,12 +706,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.True(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -745,12 +742,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.False(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - Expiration: &metav1.Time{now.Add(10 * time.Minute)}, - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + Expiration: &metav1.Time{now.Add(10 * time.Minute)}, + StartTimestamp: ×tamp, }, }, }, @@ -782,12 +778,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.True(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -820,12 +815,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.False(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -858,12 +852,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.True(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -896,12 +889,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.True(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -934,12 +926,11 @@ func TestProcessBackupCompletions(t *testing.T) { DefaultVolumesToFsBackup: boolptr.False(), }, Status: velerov1api.BackupStatus{ - Phase: velerov1api.BackupPhaseCompleted, - Version: 1, - FormatVersion: "1.1.0", - StartTimestamp: ×tamp, - CompletionTimestamp: ×tamp, - Expiration: ×tamp, + Phase: velerov1api.BackupPhaseFinalizingAfterPluginOperations, + Version: 1, + FormatVersion: "1.1.0", + StartTimestamp: ×tamp, + Expiration: ×tamp, }, }, }, @@ -1087,13 +1078,16 @@ func TestProcessBackupCompletions(t *testing.T) { // Ensure we have a CompletionTimestamp when uploading and that the backup name matches the backup in the object store. // Failures will display the bytes in buf. - hasNameAndCompletionTimestamp := func(info persistence.BackupInfo) bool { + hasNameAndCompletionTimestampIfCompleted := func(info persistence.BackupInfo) bool { buf := new(bytes.Buffer) buf.ReadFrom(info.Metadata) return info.Name == test.backup.Name && - strings.Contains(buf.String(), `"completionTimestamp": "2006-01-02T22:04:05Z"`) + (!(strings.Contains(buf.String(), `"phase": "Completed"`) || + strings.Contains(buf.String(), `"phase": "Failed"`) || + strings.Contains(buf.String(), `"phase": "PartiallyFailed"`)) || + strings.Contains(buf.String(), `"completionTimestamp": "2006-01-02T22:04:05Z"`)) } - backupStore.On("PutBackup", mock.MatchedBy(hasNameAndCompletionTimestamp)).Return(nil) + backupStore.On("PutBackup", mock.MatchedBy(hasNameAndCompletionTimestampIfCompleted)).Return(nil) // add the test's backup to the informer/lister store require.NotNil(t, test.backup) diff --git a/pkg/controller/backup_finalizer_controller.go b/pkg/controller/backup_finalizer_controller.go index 253a453c01f..c8db402babb 100644 --- a/pkg/controller/backup_finalizer_controller.go +++ b/pkg/controller/backup_finalizer_controller.go @@ -26,7 +26,7 @@ import ( "github.com/sirupsen/logrus" apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" - "k8s.io/apimachinery/pkg/util/clock" + clocks "k8s.io/utils/clock" ctrl "sigs.k8s.io/controller-runtime" kbclient "sigs.k8s.io/controller-runtime/pkg/client" @@ -42,7 +42,7 @@ import ( // backupFinalizerReconciler reconciles a Backup object type backupFinalizerReconciler struct { client kbclient.Client - clock clock.Clock + clock clocks.WithTickerAndDelayedExecution backupper pkgbackup.Backupper newPluginManager func(logrus.FieldLogger) clientmgmt.Manager metrics *metrics.ServerMetrics @@ -53,7 +53,7 @@ type backupFinalizerReconciler struct { // NewBackupFinalizerReconciler initializes and returns backupFinalizerReconciler struct. func NewBackupFinalizerReconciler( client kbclient.Client, - clock clock.Clock, + clock clocks.WithTickerAndDelayedExecution, backupper pkgbackup.Backupper, newPluginManager func(logrus.FieldLogger) clientmgmt.Manager, backupStoreGetter persistence.ObjectBackupStoreGetter, diff --git a/pkg/controller/backup_finalizer_controller_test.go b/pkg/controller/backup_finalizer_controller_test.go index f64c5eca66c..828412a7f9c 100644 --- a/pkg/controller/backup_finalizer_controller_test.go +++ b/pkg/controller/backup_finalizer_controller_test.go @@ -30,7 +30,7 @@ import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/types" - "k8s.io/apimachinery/pkg/util/clock" + testclocks "k8s.io/utils/clock/testing" ctrl "sigs.k8s.io/controller-runtime" kbclient "sigs.k8s.io/controller-runtime/pkg/client" @@ -45,7 +45,7 @@ import ( velerotest "github.com/vmware-tanzu/velero/pkg/test" ) -func mockBackupFinalizerReconciler(fakeClient kbclient.Client, fakeClock *clock.FakeClock) (*backupFinalizerReconciler, *fakeBackupper) { +func mockBackupFinalizerReconciler(fakeClient kbclient.Client, fakeClock *testclocks.FakeClock) (*backupFinalizerReconciler, *fakeBackupper) { backupper := new(fakeBackupper) return NewBackupFinalizerReconciler( fakeClient, @@ -58,7 +58,7 @@ func mockBackupFinalizerReconciler(fakeClient kbclient.Client, fakeClock *clock. ), backupper } func TestBackupFinalizerReconcile(t *testing.T) { - fakeClock := clock.NewFakeClock(time.Now()) + fakeClock := testclocks.NewFakeClock(time.Now()) metav1Now := metav1.NewTime(fakeClock.Now()) defaultBackupLocation := builder.ForBackupStorageLocation(velerov1api.DefaultNamespace, "default").Result() diff --git a/pkg/controller/backup_sync_controller_test.go b/pkg/controller/backup_sync_controller_test.go index 2d3ac01f1e0..44d40b729c4 100644 --- a/pkg/controller/backup_sync_controller_test.go +++ b/pkg/controller/backup_sync_controller_test.go @@ -30,10 +30,10 @@ import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/types" - "k8s.io/apimachinery/pkg/util/clock" "k8s.io/apimachinery/pkg/util/sets" "k8s.io/apimachinery/pkg/util/validation" core "k8s.io/client-go/testing" + testclocks "k8s.io/utils/clock/testing" ctrl "sigs.k8s.io/controller-runtime" ctrlClient "sigs.k8s.io/controller-runtime/pkg/client" @@ -157,7 +157,7 @@ func numBackups(c ctrlClient.WithWatch, ns string) (int, error) { var _ = Describe("Backup Sync Reconciler", func() { It("Test Backup Sync Reconciler basic function", func() { - fakeClock := clock.NewFakeClock(time.Now()) + fakeClock := testclocks.NewFakeClock(time.Now()) type cloudBackupData struct { backup *velerov1api.Backup podVolumeBackups []*velerov1api.PodVolumeBackup diff --git a/pkg/plugin/velero/backupitemaction/v2/backup_item_action.go b/pkg/plugin/velero/backupitemaction/v2/backup_item_action.go index 30ac41ba5cb..920f07965ed 100644 --- a/pkg/plugin/velero/backupitemaction/v2/backup_item_action.go +++ b/pkg/plugin/velero/backupitemaction/v2/backup_item_action.go @@ -29,7 +29,10 @@ import ( // BackupItemAction is an actor that performs an operation on an individual item being backed up. type BackupItemAction interface { - // Name returns the name of this BIA + // Name returns the name of this BIA. Plugins which implement this interface must defined Name, + // but its content is unimportant, as it won't actually be called via RPC. Velero's plugin infrastructure + // will implement this directly rather than delegating to the RPC plugin in order to return the name + // that the plugin was registered under. The plugins must implement the method to complete the interface. Name() string // AppliesTo returns information about which resources this action should be invoked for. diff --git a/site/content/docs/main/api-types/backup.md b/site/content/docs/main/api-types/backup.md index 8c703404dba..97dc07f299a 100644 --- a/site/content/docs/main/api-types/backup.md +++ b/site/content/docs/main/api-types/backup.md @@ -33,6 +33,10 @@ spec: # CSI VolumeSnapshot status turns to ReadyToUse during creation, before # returning error as timeout. The default value is 10 minute. csiSnapshotTimeout: 10m + # ItemOperationTimeout specifies the time used to wait for + # asynchronous BackupItemAction operations + # The default value is 1 hour. + csiSnapshotTimeout: 1h # Array of namespaces to include in the backup. If unspecified, all namespaces are included. # Optional. includedNamespaces: @@ -146,7 +150,8 @@ status: expiration: null # The current phase. # Valid values are New, FailedValidation, InProgress, WaitingForPluginOperations, - # WaitingForPluginOperationsPartiallyFailed, Completed, PartiallyFailed, Failed. + # WaitingForPluginOperationsPartiallyFailed, FinalizingafterPluginOperations, + # FinalizingAfterPluginOperationsPartiallyFailed, Completed, PartiallyFailed, Failed. phase: "" # An array of any validation errors encountered. validationErrors: null @@ -158,6 +163,12 @@ status: volumeSnapshotsAttempted: 2 # Number of volume snapshots that Velero successfully created for this backup. volumeSnapshotsCompleted: 1 + # Number of attempted async BackupItemAction operations for this backup. + asyncBackupIemOperationsAttempted: 2 + # Number of async BackupItemAction operations that Velero successfully completed for this backup. + asyncBackupIemOperationsCompleted: 1 + # Number of async BackupItemAction operations that ended in failure for this backup. + asyncBackupIemOperationsFailed: 0 # Number of warnings that were logged by the backup. warnings: 2 # Number of errors that were logged by the backup.