Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow backup storage locations to specify backup sync period or toggle off sync #1936

Merged
merged 2 commits into from
Oct 24, 2019

Conversation

betta1
Copy link
Contributor

@betta1 betta1 commented Oct 3, 2019

This resolves #1934

@betta1 betta1 force-pushed the betta1/backup-sync-bsl branch from 57e57cc to 0aa7035 Compare October 3, 2019 19:24
@@ -37,6 +38,11 @@ import (
"github.com/vmware-tanzu/velero/pkg/plugin/clientmgmt"
)

const (
backupSyncPeriodKey = "backupSyncPeriod"
skipBackupSyncKey = "skipBackupSync"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use backupSyncPeriod: 0 be a sentinel value for skipping sync, so we only have one variable to check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I had initially thought of doing it that way but figured adding a config variable for skipping sync would make it more explicit. I'll update the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly true that it's more explicit, which tends to be good. Curious what others think, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the changes to use backupSyncPeriod: 0 as a sentinel value for skipping sync all ready, just waiting for input from other reviewers then will update the PR.

@nrb nrb requested review from skriss, carlisia and prydonius October 7, 2019 21:06
@betta1 betta1 force-pushed the betta1/backup-sync-bsl branch 2 times, most recently from 5173fef to 3bdeedc Compare October 14, 2019 16:28
Copy link
Contributor

@prydonius prydonius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the --backup-sync-period option in the Velero server?

Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @betta1 - few comments that we need to resolve before proceeding here.

@@ -134,6 +136,32 @@ func (c *backupSyncController) run() {

for _, location := range locations {
log := c.logger.WithField("backupLocation", location.Name)

if backupSyncPeriod, ok := location.Spec.Config[backupSyncPeriodKey]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're adding this to BackupStorageLocations, it makes more sense to be a first-class Spec field (optional) than an opaque config key, since it's not provider-specific. It would also mean it could be strongly typed as a duration field. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I did find this strange. velero install should also set the default value when creating a BSL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree we could have this as a first class Spec field on the BSL. I had brought this issue up during the community call 2 weeks ago and from the discussion at the time having this as a config key seemed an ok approach and I also found it less intrusive change than adding it as a Spec field. I initially viewed the custom backup sync period as something that BSLs can opt in if they want to have sync periods that are greater than the server's global backup sync period or toggle of sync entirely but we can def have this as a Spec field. In addition to the Spec field, we'd probably also need to add backup-sync-period flag to both velero-install and velero backup-location create cli commands as well.

Comment on lines 152 to 160
log.Debug("Checking if backups need to be synced at this time for this location")
lastSync := location.Status.LastSyncedTime
nextSync := lastSync.Add(syncDuration)
if time.Now().UTC().Before(nextSync) {
c.logger.WithFields(logrus.Fields{
"backupLocation": location.Name,
"lastSync": lastSync,
"nextSync": nextSync,
}).Debug("Sync not required at this time")
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relationship between this logic and the --backup-sync-period could produce some unintuitive behavior in certain instances. For example, if --backup-sync-period is set to 1h, but a specific BSL has a value of 1m, that BSL will still only get synced every 1h, because the run() function that this check is in will only be run every hour.

I would like to do a little more thinking on how this should work. One option would be to have the run() function run frequently (i.e. every 30s? minute? TBD), and each time it runs, check each BSL to see if it should be synced, based on its last-synced time, and either its specific sync interval, or the default value (as provided via --backup-sync-period).

I am also curious - are you mainly interested in being able to disable sync for a specific BSL, or is the custom interval equally important? What's the use case for the custom interval?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should change this to the approach proposed by @skriss. The CronJob controller checks every 10 seconds, but I think 30s/1m is also reasonable for this usecase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skriss I agree with that approach as well. We're interested in both. For our use case we have an object store plugin and currently Velero queries the backup target every 1 min which is too frequent especially when we have several backup storage locations for a given backup target. The backup target may be shared with other tenants and we'd like to configure how frequently we need to connect to the target so we don't hit it with too many connection requests. There're other cases where the backup target might be offline e.g during an upgrade etc and the ability to toggle off backup sync in this case would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skriss @prydonius I updated the PR as the approach suggested, please take a look and let me know any edits I need to make. Sorry for the late update I was OOO towards end of last week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries - I'll take a look soon.

@betta1 betta1 force-pushed the betta1/backup-sync-bsl branch 2 times, most recently from 18e6bff to 5baa50b Compare October 22, 2019 17:22
@betta1 betta1 requested a review from skriss October 22, 2019 17:44
@betta1 betta1 force-pushed the betta1/backup-sync-bsl branch from 5baa50b to d5e20eb Compare October 23, 2019 14:27
Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betta1 this mostly looks good, added a couple small specific comments. A couple general comments as well:

  • https://github.com/vmware-tanzu/velero/blob/master/pkg/cmd/server/server.go#L197 - the help text can probably now be tweaked to indicate that this is the default sync period for BSLs if none is explicitly specified for a BSL
  • I'm wondering if we want to make the hard-coded 1-minute resync period configurable? Prior to this change, a user could set it to e.g. 1s if they wanted; now, 1 minute is effectively the minimum interval for syncing a BSL. I'm also wary of adding too many flags, so interested in others' thoughts here. We could also just lower that hardcoded value (30s? 10s?), which should reduce the need for customizing it.

pkg/apis/velero/v1/backup_storage_location.go Outdated Show resolved Hide resolved
pkg/cmd/cli/backuplocation/create.go Outdated Show resolved Hide resolved
@prydonius
Copy link
Contributor

I'm wondering if we want to make the hard-coded 1-minute resync period configurable? Prior to this change, a user could set it to e.g. 1s if they wanted; now, 1 minute is effectively the minimum interval for syncing a BSL. I'm also wary of adding too many flags, so interested in others' thoughts here. We could also just lower that hardcoded value (30s? 10s?), which should reduce the need for customizing it.

Adding a flag for it is going to be a little confusing IMO (users might be confused about whether they should set that or set the sync period in the BSL). I think we should just reduce it to 10s or 30s. I don't really expect anyone will want to sync this faster than 10s, and especially since CronJob has this limitation also.

@betta1 betta1 force-pushed the betta1/backup-sync-bsl branch from d5e20eb to 132dea8 Compare October 23, 2019 17:54
@betta1
Copy link
Contributor Author

betta1 commented Oct 23, 2019

I'm wondering if we want to make the hard-coded 1-minute resync period configurable? Prior to this change, a user could set it to e.g. 1s if they wanted; now, 1 minute is effectively the minimum interval for syncing a BSL. I'm also wary of adding too many flags, so interested in others' thoughts here. We could also just lower that hardcoded value (30s? 10s?), which should reduce the need for customizing it.

Adding a flag for it is going to be a little confusing IMO (users might be confused about whether they should set that or set the sync period in the BSL). I think we should just reduce it to 10s or 30s. I don't really expect anyone will want to sync this faster than 10s, and especially since CronJob has this limitation also.

I agree, I think just lowering the hardcoded value for the resync period should be sufficient for now (probably we can create an issue to track exposing this as a configurable flag later if users are interested). I've lowered it to 30s as I don't want this change to cause Velero to consume more cycles/resources just running the resync process too frequently.

Copy link
Contributor

@prydonius prydonius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks @betta1!

@@ -78,6 +82,7 @@ func (o *CreateOptions) BindFlags(flags *pflag.FlagSet) {
flags.StringVar(&o.Provider, "provider", o.Provider, "name of the backup storage provider (e.g. aws, azure, gcp)")
flags.StringVar(&o.Bucket, "bucket", o.Bucket, "name of the object storage bucket where backups should be stored")
flags.StringVar(&o.Prefix, "prefix", o.Prefix, "prefix under which all Velero data should be stored within the bucket. Optional.")
flags.DurationVar(&o.BackupSyncPeriod, "backup-sync-period", defaultBackupSyncPeriod, "how often to ensure all Velero backups in object storage exist as Backup API objects in the cluster. Optional. Set this to `0s` to disable sync")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if we want to not set an explicit value for the BSL if the user doesn't specify the flag, rather than defaulting it to a minute here? If we did that, then the server-level setting would apply. The benefit is that if the user then changes the server-level setting, it would affect all BSLs that were not set up with an explicit sync period, whereas now, they're all getting an individual setting of 1m by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would actually be ideal. I'd like to make this change and the only issue i'm running into is how can I detect that the user didn't specify the flag? Atm when I set the default value to o.BackupSyncPeriod, like so:

flags.DurationVar(&o.BackupSyncPeriod, "backup-sync-period", o.BackupSyncPeriod, "how often..."
it gets parsed as 0s which is an issue since this means disable sync. Any suggestion on how to detect this flag wasn't provided?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use this method: https://godoc.org/github.com/spf13/pflag#FlagSet.Changed

but I'm not 100% sure off the top of my head, would need to experiment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skriss, I modified to check if the user explicitly specified a backup sync period and only then set the backup sync period for the BSL. Otherwise if user does not specify the flag the BSL backup sync period is nil which will have the BSL use the server side backup sync.

@skriss
Copy link
Contributor

skriss commented Oct 23, 2019

otherwise LGTM!

@betta1 betta1 force-pushed the betta1/backup-sync-bsl branch from 132dea8 to 9b6d157 Compare October 23, 2019 22:24
Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@skriss skriss merged commit 0450567 into vmware-tanzu:master Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support backup storage locations to specify backup sync periods or toggle off backup sync
4 participants