Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIAv2 async operations controller work #5849

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/unreleased/5849-sseago
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
BIAv2 async operations controller work
21 changes: 21 additions & 0 deletions config/crd/v1/bases/velero.io_backups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,11 @@ spec:
type: string
nullable: true
type: array
itemOperationTimeout:
description: ItemOperationTimeout specifies the time used to wait
for asynchronous BackupItemAction operations The default value is
1 hour.
type: string
labelSelector:
description: LabelSelector is a metav1.LabelSelector to filter with
when adding individual objects to the backup. If empty or nil, all
Expand Down Expand Up @@ -415,6 +420,20 @@ spec:
status:
description: BackupStatus captures the current status of a Velero backup.
properties:
asyncBackupItemOperationsAttempted:
description: AsyncBackupItemOperationsAttempted is the total number
of attempted async BackupItemAction operations for this backup.
type: integer
asyncBackupItemOperationsCompleted:
description: AsyncBackupItemOperationsCompleted is the total number
of successfully completed async BackupItemAction operations for
this backup.
type: integer
asyncBackupItemOperationsFailed:
description: AsyncBackupItemOperationsFailed is the total number of
async BackupItemAction operations for this backup which ended with
an error.
type: integer
completionTimestamp:
description: CompletionTimestamp records the time a backup was completed.
Completion time is recorded even on failed backups. Completion time
Expand Down Expand Up @@ -457,6 +476,8 @@ spec:
- InProgress
- WaitingForPluginOperations
- WaitingForPluginOperationsPartiallyFailed
- FinalizingAfterPluginOperations
- FinalizingAfterPluginOperationsPartiallyFailed
- Completed
- PartiallyFailed
- Failed
Expand Down
5 changes: 5 additions & 0 deletions config/crd/v1/bases/velero.io_schedules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,11 @@ spec:
type: string
nullable: true
type: array
itemOperationTimeout:
description: ItemOperationTimeout specifies the time used to wait
for asynchronous BackupItemAction operations The default value
is 1 hour.
type: string
labelSelector:
description: LabelSelector is a metav1.LabelSelector to filter
with when adding individual objects to the backup. If empty
Expand Down
4 changes: 2 additions & 2 deletions config/crd/v1/crds/crds.go

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions design/biav2-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ message ExecuteResponse {
bytes item = 1;
repeated generated.ResourceIdentifier additionalItems = 2;
string operationID = 3;
repeated generated.ResourceIdentifier itemsToUpdate = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoesitemsToUpdate reflect the meaning of the return value as described in the comment of .go file:

a second slice of ResourceIdentifiers specifying related items which should be backed up after all asynchronous operations have completed.

But I don't have a good idea about naming here atm...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thought was itemsToUpdateAfterOperations but that felt too unwieldy. Another option would be itemsToFinalize or postOperationItems I'm fine with any of these, though -- lets discuss next week for a possible follow-on change. If we're going to change it, it needs to be before 1.11 is out. @shubham-pampattiwar any thoughts here?

Copy link
Contributor

@reasonerjt reasonerjt Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sseago I would vote for postOperationItems if we see them as a second list of items to backup.
There are some inconsistencies I mentioned in
#5849 (comment)

Maybe we should refine the flow here and treat the items in itemsToUpdate or postOperationItems differently, for example, we may define the flow like the backupper will only put these items into tarball as-is without any additional work, there is still inconsistency, but this inconsistency is easier to understand...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reasonerjt postOperationItems works for me. I'll update that in the follow-on PR. As for the inconsistencies between this and additional items, see comments below. I think we can converge things a bit more and minimize the inconsistency -- at this point, I think the only thing that must be different is that postOperationItems are not expected to create asynchronous operations, since that would potentially lead to never-ending backup.

}
```
The BackupItemAction service gets two new rpc methods:
Expand Down Expand Up @@ -78,6 +79,19 @@ message OperationProgress {
}
```

In addition to the two new rpc methods added to the BackupItemAction interface, there is also a new `Name()` method. This one is only actually used internally by Velero to get the name that the plugin was registered with, but it still must be defined in a plugin which implements BackupItemActionV2 in order to implement the interface. It doesn't really matter what it returns, though, as this particular method is not delegated to the plugin via RPC calls. The new (and modified) interface methods for `BackupItemAction` are as follows:
```
type BackupItemAction interface {
...
Name() string
...
Execute(item runtime.Unstructured, backup *api.Backup) (runtime.Unstructured, []velero.ResourceIdentifier, string, []velero.ResourceIdentifier, error)
Progress(operationID string, backup *api.Backup) (velero.OperationProgress, error)
Cancel(operationID string, backup *api.Backup) error
...
}
```

A new PluginKind, `BackupItemActionV2`, will be created, and the backup process will be modified to use this plugin kind.
See [Plugin Versioning](plugin-versioning.md) for more details on implementation plans, including v1 adapters, etc.

Expand Down
211 changes: 133 additions & 78 deletions design/general-progress-monitoring.md

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions internal/hook/item_hook_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,20 @@ func (h *DefaultItemHookHandler) HandleHooks(
return nil
}

// NoOpItemHookHandler is the an itemHookHandler for the Finalize controller where hooks don't run
type NoOpItemHookHandler struct{}

func (h *NoOpItemHookHandler) HandleHooks(
log logrus.FieldLogger,
groupResource schema.GroupResource,
obj runtime.Unstructured,
resourceHooks []ResourceHook,
phase hookPhase,
) error {

return nil
}

func phasedKey(phase hookPhase, key string) string {
if phase != "" {
return fmt.Sprintf("%v.%v", phase, key)
Expand Down
39 changes: 38 additions & 1 deletion pkg/apis/velero/v1/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,11 @@ type BackupSpec struct {
// The default value is 10 minute.
// +optional
CSISnapshotTimeout metav1.Duration `json:"csiSnapshotTimeout,omitempty"`

// ItemOperationTimeout specifies the time used to wait for asynchronous BackupItemAction operations
// The default value is 1 hour.
// +optional
ItemOperationTimeout metav1.Duration `json:"itemOperationTimeout,omitempty"`
}

// BackupHooks contains custom behaviors that should be executed at different phases of the backup.
Expand Down Expand Up @@ -221,7 +226,7 @@ const (

// BackupPhase is a string representation of the lifecycle phase
// of a Velero backup.
// +kubebuilder:validation:Enum=New;FailedValidation;InProgress;WaitingForPluginOperations;WaitingForPluginOperationsPartiallyFailed;Completed;PartiallyFailed;Failed;Deleting
// +kubebuilder:validation:Enum=New;FailedValidation;InProgress;WaitingForPluginOperations;WaitingForPluginOperationsPartiallyFailed;FinalizingAfterPluginOperations;FinalizingAfterPluginOperationsPartiallyFailed;Completed;PartiallyFailed;Failed;Deleting
type BackupPhase string

const (
Expand Down Expand Up @@ -251,6 +256,23 @@ const (
// ongoing. The backup is not usable yet.
BackupPhaseWaitingForPluginOperationsPartiallyFailed BackupPhase = "WaitingForPluginOperationsPartiallyFailed"

// BackupPhaseFinalizingAfterPluginOperations means the backup of
// Kubernetes resources, creation of snapshots, and other
// async plugin operations were successful and snapshot upload and
// other plugin operations are now complete, but the Backup is awaiting
// final update of resources modified during async operations.
// The backup is not usable yet.
BackupPhaseFinalizingAfterPluginOperations BackupPhase = "FinalizingAfterPluginOperations"

// BackupPhaseFinalizingAfterPluginOperationsPartiallyFailed means the backup of
// Kubernetes resources, creation of snapshots, and other
// async plugin operations were successful and snapshot upload and
// other plugin operations are now complete, but one or more errors
// occurred during backup or async operation processing, and the
// Backup is awaiting final update of resources modified during async
// operations. The backup is not usable yet.
BackupPhaseFinalizingAfterPluginOperationsPartiallyFailed BackupPhase = "FinalizingAfterPluginOperationsPartiallyFailed"

// BackupPhaseCompleted means the backup has run successfully without
// errors.
BackupPhaseCompleted BackupPhase = "Completed"
Expand Down Expand Up @@ -351,6 +373,21 @@ type BackupStatus struct {
// completed CSI VolumeSnapshots for this backup.
// +optional
CSIVolumeSnapshotsCompleted int `json:"csiVolumeSnapshotsCompleted,omitempty"`

// AsyncBackupItemOperationsAttempted is the total number of attempted
// async BackupItemAction operations for this backup.
// +optional
AsyncBackupItemOperationsAttempted int `json:"asyncBackupItemOperationsAttempted,omitempty"`

// AsyncBackupItemOperationsCompleted is the total number of successfully completed
// async BackupItemAction operations for this backup.
// +optional
AsyncBackupItemOperationsCompleted int `json:"asyncBackupItemOperationsCompleted,omitempty"`

// AsyncBackupItemOperationsFailed is the total number of async
// BackupItemAction operations for this backup which ended with an error.
// +optional
AsyncBackupItemOperationsFailed int `json:"asyncBackupItemOperationsFailed,omitempty"`
}

// BackupProgress stores information about the progress of a Backup's execution.
Expand Down
1 change: 1 addition & 0 deletions pkg/apis/velero/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 14 additions & 5 deletions pkg/archive/filesystem.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,20 @@ import (

// GetItemFilePath returns an item's file path once extracted from a Velero backup archive.
func GetItemFilePath(rootDir, groupResource, namespace, name string) string {
switch namespace {
case "":
return filepath.Join(rootDir, velerov1api.ResourcesDir, groupResource, velerov1api.ClusterScopedDir, name+".json")
default:
return filepath.Join(rootDir, velerov1api.ResourcesDir, groupResource, velerov1api.NamespaceScopedDir, namespace, name+".json")
return GetVersionedItemFilePath(rootDir, groupResource, namespace, name, "")
}

// GetVersionedItemFilePath returns an item's file path once extracted from a Velero backup archive, with version included.
func GetVersionedItemFilePath(rootDir, groupResource, namespace, name, versionPath string) string {
return filepath.Join(rootDir, velerov1api.ResourcesDir, groupResource, versionPath, GetScopeDir(namespace), namespace, name+".json")
}

// GetScopeDir returns NamespaceScopedDir if namespace is present, or ClusterScopedDir if empty
func GetScopeDir(namespace string) string {
if namespace == "" {
return velerov1api.ClusterScopedDir
} else {
return velerov1api.NamespaceScopedDir
}
}

Expand Down
18 changes: 18 additions & 0 deletions pkg/archive/filesystem_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,22 @@ func TestGetItemFilePath(t *testing.T) {

res = GetItemFilePath("root", "resource", "namespace", "item")
assert.Equal(t, "root/resources/resource/namespaces/namespace/item.json", res)

res = GetItemFilePath("", "resource", "", "item")
assert.Equal(t, "resources/resource/cluster/item.json", res)

res = GetVersionedItemFilePath("root", "resource", "", "item", "")
assert.Equal(t, "root/resources/resource/cluster/item.json", res)

res = GetVersionedItemFilePath("root", "resource", "namespace", "item", "")
assert.Equal(t, "root/resources/resource/namespaces/namespace/item.json", res)

res = GetVersionedItemFilePath("root", "resource", "namespace", "item", "v1")
assert.Equal(t, "root/resources/resource/v1/namespaces/namespace/item.json", res)

res = GetVersionedItemFilePath("root", "resource", "", "item", "v1")
assert.Equal(t, "root/resources/resource/v1/cluster/item.json", res)

res = GetVersionedItemFilePath("", "resource", "", "item", "")
assert.Equal(t, "resources/resource/cluster/item.json", res)
}
Loading