Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cleanPolicy for backup CR #3002

Merged
merged 16 commits into from
Jul 24, 2020

Conversation

lichunzhu
Copy link
Contributor

@lichunzhu lichunzhu commented Jul 22, 2020

What problem does this PR solve?

fix #2989

What is changed and how does it work?

  • Change cleanData to cleanPolicy
  • Change rclone judgement for remote file not exist error.

Check List

Tests

  • Unit test
  • E2E test
  • Stability test
  • Manual test (add detailed scripts or steps below)
    • Test on failure test for br and works successfully.

Code changes

  • Has Go code change

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Does this PR introduce a user-facing change?:

support cleanPolicy for backup CR

@lichunzhu lichunzhu requested a review from cofyc July 22, 2020 10:11
@lichunzhu lichunzhu added the status/PTAL PR needs to be reviewed label Jul 22, 2020
@lichunzhu lichunzhu requested a review from DanielZhangQD July 22, 2020 10:11
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2020

Codecov Report

Merging #3002 into master will decrease coverage by 0.05%.
The diff coverage is 28.40%.

@@            Coverage Diff             @@
##           master    #3002      +/-   ##
==========================================
- Coverage   42.00%   41.94%   -0.06%     
==========================================
  Files         155      156       +1     
  Lines       16744    16773      +29     
==========================================
+ Hits         7033     7036       +3     
- Misses       9144     9170      +26     
  Partials      567      567              
Flag Coverage Δ
#unittest 41.94% <28.40%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Comment on lines 1119 to 1124
// CleanPolicyTypeRetain represents the clean policy is retain
CleanPolicyTypeRetain CleanPolicyType = "Retain"
// CleanPolicyTypeOnFailure represents the clean policy is on failure
CleanPolicyTypeOnFailure CleanPolicyType = "OnFailure"
// CleanPolicyTypeIfFailed represents the clean policy is delete
CleanPolicyTypeDelete CleanPolicyType = "Delete"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe the behavior clearly for each policy.

@@ -1148,8 +1161,8 @@ type BackupSpec struct {
UseKMS bool `json:"useKMS,omitempty"`
// Specify service account of backup
ServiceAccount string `json:"serviceAccount,omitempty"`
// CleanData denotes whether to clean backup data before the object is deleted from the cluster
CleanData bool `json:"cleanData,omitempty"`
// CleanPolicy denotes whether to clean backup data before the object is deleted from the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CleanPolicy denotes whether to clean backup data before the object is deleted from the cluster
// CleanPolicy denotes whether to clean backup data when the object is deleted from the cluster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default behavior?

Comment on lines 109 to 110
case v1alpha1.CleanPolicyTypeRetain:
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed?

if backup.DeletionTimestamp == nil || !backup.Spec.CleanData {
// The backup object has not been deleted,do nothing
if backup.DeletionTimestamp == nil ||
backup.Spec.CleanPolicy == v1alpha1.CleanPolicyTypeRetain || backup.Spec.CleanPolicy == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!shouldCleanData()?

@@ -86,7 +86,8 @@ func (bc *defaultBackupControl) removeProtectionFinalizer(backup *v1alpha1.Backu
ns := backup.GetNamespace()
name := backup.GetName()

if backup.Spec.CleanData && isDeletionCandidate(backup) && v1alpha1.IsBackupClean(backup) {
if backup.Spec.CleanPolicy != v1alpha1.CleanPolicyTypeRetain && backup.Spec.CleanPolicy != "" &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1alpha1.ShouldCleanData(backup)

@@ -80,6 +80,15 @@ func (bc *backupCleaner) Clean(backup *v1alpha1.Backup) error {
Status: corev1.ConditionTrue,
})
}

if backup.Spec.CleanPolicy == v1alpha1.CleanPolicyTypeOnFailure && !v1alpha1.IsBackupFailed(backup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should be in L60 and we should not set BackupClean condition for this case as the data is not cleaned but retained for the successful backup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finalizer handling logic should also be updated accordingly.

@@ -1111,6 +1111,19 @@ type TiDBAccessConfig struct {
TLSClientSecretName *string `json:"tlsClientSecretName,omitempty"`
}

// +k8s:openapi-gen=true
// CleanPolicyType represents the specific delete cloud data policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CleanPolicyType represents the specific delete cloud data policy
// CleanPolicyType represents the clean policy of backup data in remote storage

type CleanPolicyType string

const (
// CleanPolicyTypeRetain represents the clean policy is to retain S3 backup files at any time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CleanPolicyTypeRetain represents the clean policy is to retain S3 backup files at any time
// CleanPolicyTypeRetain represents that the backup data in remote storage will be retained when the Backup CR is deleted

const (
// CleanPolicyTypeRetain represents the clean policy is to retain S3 backup files at any time
CleanPolicyTypeRetain CleanPolicyType = "Retain"
// CleanPolicyTypeOnFailure represents the clean policy is to clean S3 backup files on failure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CleanPolicyTypeOnFailure represents the clean policy is to clean S3 backup files on failure
// CleanPolicyTypeOnFailure represents that the backup data in remote storage will be cleaned only for the failed backups when the Backup CR is deleted

CleanPolicyTypeRetain CleanPolicyType = "Retain"
// CleanPolicyTypeOnFailure represents the clean policy is to clean S3 backup files on failure
CleanPolicyTypeOnFailure CleanPolicyType = "OnFailure"
// CleanPolicyTypeIfFailed represents the clean policy is to clean S3 backup files at any time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CleanPolicyTypeIfFailed represents the clean policy is to clean S3 backup files at any time
// CleanPolicyTypeIfFailed represents that the backup data in remote storage will be cleaned when the Backup CR is deleted

@lichunzhu
Copy link
Contributor Author

/test pull-e2e-kind

if backup.DeletionTimestamp == nil || !backup.Spec.CleanData {
// The backup object has not been deleted,do nothing
if backup.DeletionTimestamp == nil || !v1alpha1.ShouldCleanData(backup) ||
backup.Spec.CleanPolicy == v1alpha1.CleanPolicyTypeOnFailure && !v1alpha1.IsBackupFailed(backup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backup.Spec.CleanPolicy == v1alpha1.CleanPolicyTypeOnFailure && !v1alpha1.IsBackupFailed(backup) {
(backup.Spec.CleanPolicy == v1alpha1.CleanPolicyTypeOnFailure && !v1alpha1.IsBackupFailed(backup)) {

@@ -1148,8 +1161,8 @@ type BackupSpec struct {
UseKMS bool `json:"useKMS,omitempty"`
// Specify service account of backup
ServiceAccount string `json:"serviceAccount,omitempty"`
// CleanData denotes whether to clean backup data before the object is deleted from the cluster
CleanData bool `json:"cleanData,omitempty"`
// CleanPolicy denotes whether to clean backup data when the object is deleted from the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CleanPolicy denotes whether to clean backup data when the object is deleted from the cluster
// CleanPolicy denotes whether to clean backup data when the object is deleted from the cluster, if not set, the backup data will be retained

@@ -86,7 +86,8 @@ func (bc *defaultBackupControl) removeProtectionFinalizer(backup *v1alpha1.Backu
ns := backup.GetNamespace()
name := backup.GetName()

if backup.Spec.CleanData && isDeletionCandidate(backup) && v1alpha1.IsBackupClean(backup) {
if v1alpha1.ShouldCleanData(backup) && isDeletionCandidate(backup) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define a function for these conditions.

DanielZhangQD
DanielZhangQD previously approved these changes Jul 24, 2020
Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lichunzhu
Copy link
Contributor Author

@cofyc PTAL

}
return fmt.Errorf("cluster %s, execute rclone deletefile command failed, output: %s, err: %v", bo, string(output), err)
if err != nil && !strings.Contains(string(output), "doesn't exist") {
return fmt.Errorf("cluster %s, execute rclone deletefile command to delete archive failed, output: %s, err: %v", bo, string(output), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changing this back?

Copy link
Contributor Author

@lichunzhu lichunzhu Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because according to test, this error code is not accurate. When rclone returns dir doesn't exist error the exit code is 1. May the last time I forgot to use hack/local-up-operator.sh to update operator.

Copy link
Contributor

@cofyc cofyc Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rclone delete is a better command in this scenario

https://rclone.org/commands/rclone_delete/

in my testing, it succeeds if no file found at the specified path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It won't even return error. Should we use this command?
C.C. @DanielZhangQD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK if the file can be removed if exist.

Copy link
Contributor Author

@lichunzhu lichunzhu Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revise to rclone delete in 4e1d38f

@@ -288,7 +288,7 @@ func newBackup() *v1alpha1.Backup {
},
StorageClassName: pointer.StringPtr("local-storage"),
StorageSize: "1Gi",
CleanData: true,
CleanPolicy: v1alpha1.CleanPolicyTypeDelete,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add tests for other values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add tests later

return fmt.Errorf("cluster %s, execute rclone deletefile command to delete archive failed, output: %s, err: %v", bo, string(output), err)
}

args = util.ConstructArgs(constants.RcloneConfigArg, opts, "deletefile", fmt.Sprintf("%s.tmp", destBucket), "")
Copy link
Contributor

@cofyc cofyc Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will this *.tmp file be created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After file is archived and uploaded to S3.

  1. upload archive file to *.tgz.tmp
  2. move *.tgz.tmp to *.tgz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is for backup with dumpling only.

@DanielZhangQD
Copy link
Contributor

/test pull-e2e-kind

@lichunzhu
Copy link
Contributor Author

@cofyc addressed, PTAL again

DanielZhangQD
DanielZhangQD previously approved these changes Jul 24, 2020
Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return nil
}
}
return fmt.Errorf("cluster %s, execute rclone deletefile command failed, output: %s, err: %v", bo, string(output), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/deletefile/delete


args = util.ConstructArgs(constants.RcloneConfigArg, opts, "delete", fmt.Sprintf("%s.tmp", destBucket), "")
output, err = exec.Command("rclone", args...).CombinedOutput()
if err != nil {
return fmt.Errorf("cluster %s, execute rclone deletefile command failed, output: %s, err: %v", bo, string(output), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/deletefile/delete

Status: corev1.ConditionTrue,
Reason: "UpdateBackupPathFailed",
Message: err.Error(),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why setting this condition if StatusUpdater.Update( fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's meaningless to call StatusUpdater.Update again if the previous StatusUpdater.Update fails as the second call can fail too

our current architecture relies on our backup job to update Backup CR's status, the situation that the status may not be reported to the apiserver is unavoidable

does it cause problems when the StatusUpdater.Update call fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we omit this error, we might be unable to clean files on the Cloud when we delete this CR because BackupPath is not recorded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the bm.backupDataToRemote will not be run if reporting the backup path to the apiserver fails

(if this is not true, I think we should fix it. one more API call does not help)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we omit this error? What about BR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we didn't omit the error, the job fails

@lichunzhu
Copy link
Contributor Author

@DanielZhangQD PTAL again

Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DanielZhangQD DanielZhangQD merged commit fe8bf9f into pingcap:master Jul 24, 2020
ti-srebot pushed a commit to ti-srebot/tidb-operator that referenced this pull request Jul 24, 2020
@ti-srebot
Copy link
Contributor

cherry pick to release-1.1 in PR #3018

DanielZhangQD pushed a commit that referenced this pull request Jul 27, 2020
Signed-off-by: ti-srebot <[email protected]>

Co-authored-by: Chunzhu Li <[email protected]>
@lichunzhu lichunzhu mentioned this pull request Jul 27, 2020
@lichunzhu lichunzhu deleted the removeCloudTempData branch September 2, 2020 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When deleting a failed backup CR, the backup data on s3 is not cleaned up。
5 participants