Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the backup schedule cron job so that only one backup runs at a time. #85

Closed
calind opened this issue Jul 23, 2018 · 2 comments · Fixed by #255
Closed

Improve the backup schedule cron job so that only one backup runs at a time. #85

calind opened this issue Jul 23, 2018 · 2 comments · Fixed by #255
Milestone

Comments

@calind
Copy link
Member

calind commented Jul 23, 2018

  1. Controller should not rely on an internal lock, but rather check if there's an active job
    https://github.com/presslabs/mysql-operator/blob/e0ef8bb900001a91db4aa0e78596a8f8a0bf9d12/pkg/controller/clustercontroller/backups.go#L101-L102
  2. In case of an active job, we should create an automated backup, but mark it as failed, with the reason that another backup is running
@calind calind added this to the 0.2.x milestone Jul 23, 2018
@cu12
Copy link

cu12 commented Oct 9, 2018

@calind I concur, I recently saw some weird issues due to this

I had a cluster down, but the operator kept starting the new backup jobs and that resulted in two errors in our environment which I believe are connected to this.

  1. We ran into sshd.socket stops working after a while coreos/bugs#2181 on nodes, where these pods were scheduled (gazillion pods every second)
  2. Apps that were talking to php-fpm on these nodes were stuck after some time

It's a long shot, but either some back-off and/or mechanism to defer the backup job when cluster is not in ready state would be nice

@HBO2
Copy link
Contributor

HBO2 commented Oct 18, 2018

@calind I totally agree with @cu12. It takes a while to have the cluster up and running and I had to some big issues with a multitude of failing pods based on the backup jobs.

And also when the S3 is nog configures correctly for some reason, you will get an immense amount of pods.

thx

@AMecea AMecea modified the milestones: 0.2.x, 0.2.6 Feb 25, 2019
@AMecea AMecea modified the milestones: 0.2.6, 0.2.7 Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants