-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the race condition by confirming creation/deletion of machine objects #316
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: k4leung4 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @karan @medinatiger |
/uncc @kris-nova @timothysc |
/hold |
…ects By waiting for the machine objects to be either created or deleted before leaving Reconcile loop will prevent the race condition of stale cache not reporting correctly the change to machine objects.
/assign @mkjelland @karan @medinatiger |
|
||
// stateConfirmationInterval is the amount of time between polling for the desired state. | ||
// The polling is against a local memory cache. | ||
var stateConfirmationInterval = 100 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this too fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is against an local internal cache, it shouldnt be an issue.
i can raise the interval if it is a concern
if errors.IsNotFound(err) { | ||
return false, nil | ||
} | ||
return false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please log all errors explicitly for debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error from the retry loop goes to pollErr which is always logged.
lgtm |
@@ -342,3 +332,41 @@ func getMachinesToDelete(filteredMachines []*v1alpha1.Machine, diff int) []*v1al | |||
// see: https://github.com/kubernetes/kube-deploy/issues/625 | |||
return filteredMachines[:diff] | |||
} | |||
|
|||
func (c *MachineSetControllerImpl) waitForMachineCreation(machineList []*v1alpha1.Machine) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitForMachineCreation and Deletion probably have some common code with deployer/clusterctl. Maybe have a separate PR in the future to abstract out to put it into util.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree.
will clean up in future PR.
lgtm |
/lgtm |
/hold cancel |
…mplatize-k8s-version clusterctl: templatize kubernetes version
By waiting for the machine objects to be either created or deleted
before leaving Reconcile loop will prevent the race condition of stale
cache not reporting correctly the change to machine objects.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #245
Special notes for your reviewer:
Release note:
@kubernetes/kube-deploy-reviewers