Find better solution for solving race conditions with MachineSet controller reconciling #245

k4leung4 · 2018-05-30T21:08:00Z

The current MachineSet controller watches machine and determines whether to reconcile the machine set that owns the machine.
This causes race conditions when the MachineSet creates machines, which may trigger another reconciliation, which ends up creating or deleting additional machines as the system may not have detected the modifications from the initial reconciliation.
A mutex lock was added initially to solve the issue, but back-to-back reconciliation happens quick enough such that the race condition still occurs.
A temp fix to sleep is currently proposed to avoid prevent this race condition.

The replica set controller solves this with using expectations, https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L113
This may be something we will want to adopt as a better fix.

The race condition is where the machineset reconciles on the same key too quickly, where the creation/deletion of machines is not detected by the second reconcilation, causing it create/delete additional machines. I attempted to use WaitForCacheSync, but that is also insufficient in preventing the race condition. The fix here is to add 1 second sleep before releasing the mutex lock when reconciling, which gives the system a chance to recognize the changes made from the first reconciliation. Issue kubernetes-sigs#245 was created to improve this hacky fix.

The race condition is where the machineset reconciles on the same key too quickly, where the creation/deletion of machines is not detected by the second reconcilation, causing it create/delete additional machines. I attempted to use WaitForCacheSync, but that is also insufficient in preventing the race condition. The fix here is to add 1 second sleep before releasing the mutex lock when reconciling, which gives the system a chance to recognize the changes made from the first reconciliation. Issue #245 was created to improve this hacky fix.

karan · 2018-06-08T17:34:03Z

/priority important-soon

karan · 2018-06-08T17:35:22Z

/assign @k4leung4

…od-workaround 🐛 Copy over hack/tools go mod workaround from cluster-api

k4leung4 mentioned this issue May 30, 2018

Fix a race condition where machineset reconciles too quickly #246

Merged

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 8, 2018

k8s-ci-robot assigned k4leung4 Jun 8, 2018

k4leung4 mentioned this issue Jun 8, 2018

Fix the race condition by confirming creation/deletion of machine objects #316

Merged

k8s-ci-robot closed this as completed in #316 Jun 11, 2018

chuckha pushed a commit to chuckha/cluster-api that referenced this issue Oct 2, 2019

Merge pull request kubernetes-sigs#245 from imikushin/hack-tools-go-m…

782b7f0

…od-workaround 🐛 Copy over hack/tools go mod workaround from cluster-api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find better solution for solving race conditions with MachineSet controller reconciling #245

Find better solution for solving race conditions with MachineSet controller reconciling #245

k4leung4 commented May 30, 2018

karan commented Jun 8, 2018

karan commented Jun 8, 2018

Find better solution for solving race conditions with MachineSet controller reconciling #245

Find better solution for solving race conditions with MachineSet controller reconciling #245

Comments

k4leung4 commented May 30, 2018

karan commented Jun 8, 2018

karan commented Jun 8, 2018