Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more backup is created after backup failure #350

Closed
bluven opened this issue Jun 12, 2019 · 4 comments
Closed

No more backup is created after backup failure #350

bluven opened this issue Jun 12, 2019 · 4 comments

Comments

@bluven
Copy link
Contributor

bluven commented Jun 12, 2019

version:0.3.0

I was testing backup function with an error secret access key. I noticed once a backup job failed(6 tries),no more backup is created, even after I used correct secret access key to do backup later.

After deleting the failed backup, new backup started.

@AMecea
Copy link
Contributor

AMecea commented Jun 20, 2019

The backup is re-tried for a few times, with exponential backoff. If the job fails then the backup will be marked as failed and completed. And will not be re-tried anymore.

Do you use the backup scheduler from the cluster? Or it's only a single backup?

@bluven
Copy link
Contributor Author

bluven commented Jun 20, 2019

I used scheduler.

I know job will have several tries but when it failed, jobSyncer will never change completed field of bakcup.status to true, this only happened when job is complete according to current implementation.

What the completed field of MysqlBackupStatus means when it is true? Does it mean the job is finished no matter whether backup is successful or it has failed? Or does it mean the backup job is success?

func (s *jobSyncer) updateStatus(job *batch.Job) {
	// check for completion condition
	if cond := jobCondition(batch.JobComplete, job); cond != nil {
		s.backup.UpdateStatusCondition(api.BackupComplete, cond.Status, cond.Reason, cond.Message)

		if cond.Status == core.ConditionTrue {
			s.backup.Status.Completed = true
		}
	}

	// check for failed condition
	if cond := jobCondition(batch.JobFailed, job); cond != nil {
		s.backup.UpdateStatusCondition(api.BackupFailed, cond.Status, cond.Reason, cond.Message)
	}
}

@AMecea
Copy link
Contributor

AMecea commented Jun 20, 2019

Your observation is right, the completed field means that the backup is in a final state, failed or successful.

@bluven
Copy link
Contributor Author

bluven commented Jun 20, 2019

Well, I thought a bool field couldn't do this job. I this it's better to have a string field like state (running, failed, success), what do you think?

I have a quick bug fix like this:

unc (j *job) scheduledBackupsRunningCount() int {
	backupsList := &api.MysqlBackupList{}
	// select all backups with labels recurrent=true and and not completed of the cluster
	selector := j.backupSelector()
	selector.MatchingField("status.completed", "false")

	if err := j.c.List(context.TODO(), selector, backupsList); err != nil {
		log.Error(err, "failed getting backups", "selector", selector)
		return 0
	}

	count := 0
	for _, b := range backupsList.Items {
		if len(b.Status.Conditions) == 0 {
			count += 1
		}
	}

	return count
}

bluven added a commit to bluven/mysql-operator that referenced this issue Jun 21, 2019
bluven added a commit to bluven/mysql-operator that referenced this issue Jun 25, 2019
@AMecea AMecea closed this as completed in 24e7b81 Jun 25, 2019
chapsuk pushed a commit to chapsuk/mysql-operator that referenced this issue Oct 16, 2023
Signed-off-by: GitHub <[email protected]>

Signed-off-by: GitHub <[email protected]>
Co-authored-by: frouioui <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants