-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find better solution for solving race conditions with MachineSet controller reconciling #245
Labels
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Comments
k4leung4
added a commit
to k4leung4/cluster-api
that referenced
this issue
May 30, 2018
The race condition is where the machineset reconciles on the same key too quickly, where the creation/deletion of machines is not detected by the second reconcilation, causing it create/delete additional machines. I attempted to use WaitForCacheSync, but that is also insufficient in preventing the race condition. The fix here is to add 1 second sleep before releasing the mutex lock when reconciling, which gives the system a chance to recognize the changes made from the first reconciliation. Issue kubernetes-sigs#245 was created to improve this hacky fix.
k4leung4
added a commit
to k4leung4/cluster-api
that referenced
this issue
May 30, 2018
The race condition is where the machineset reconciles on the same key too quickly, where the creation/deletion of machines is not detected by the second reconcilation, causing it create/delete additional machines. I attempted to use WaitForCacheSync, but that is also insufficient in preventing the race condition. The fix here is to add 1 second sleep before releasing the mutex lock when reconciling, which gives the system a chance to recognize the changes made from the first reconciliation. Issue kubernetes-sigs#245 was created to improve this hacky fix.
k8s-ci-robot
pushed a commit
that referenced
this issue
May 31, 2018
The race condition is where the machineset reconciles on the same key too quickly, where the creation/deletion of machines is not detected by the second reconcilation, causing it create/delete additional machines. I attempted to use WaitForCacheSync, but that is also insufficient in preventing the race condition. The fix here is to add 1 second sleep before releasing the mutex lock when reconciling, which gives the system a chance to recognize the changes made from the first reconciliation. Issue #245 was created to improve this hacky fix.
/priority important-soon |
k8s-ci-robot
added
the
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
label
Jun 8, 2018
/assign @k4leung4 |
chuckha
pushed a commit
to chuckha/cluster-api
that referenced
this issue
Oct 2, 2019
…od-workaround 🐛 Copy over hack/tools go mod workaround from cluster-api
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
The current MachineSet controller watches machine and determines whether to reconcile the machine set that owns the machine.
This causes race conditions when the MachineSet creates machines, which may trigger another reconciliation, which ends up creating or deleting additional machines as the system may not have detected the modifications from the initial reconciliation.
A mutex lock was added initially to solve the issue, but back-to-back reconciliation happens quick enough such that the race condition still occurs.
A temp fix to sleep is currently proposed to avoid prevent this race condition.
The replica set controller solves this with using expectations, https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L113
This may be something we will want to adopt as a better fix.
The text was updated successfully, but these errors were encountered: