Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the race condition by confirming creation/deletion of machine objects #316

Merged
merged 1 commit into from
Jun 11, 2018

Conversation

k4leung4
Copy link
Contributor

@k4leung4 k4leung4 commented Jun 8, 2018

By waiting for the machine objects to be either created or deleted
before leaving Reconcile loop will prevent the race condition of stale
cache not reporting correctly the change to machine objects.

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #245

Special notes for your reviewer:

Release note:


@kubernetes/kube-deploy-reviewers

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 8, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k4leung4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 8, 2018
@k4leung4
Copy link
Contributor Author

k4leung4 commented Jun 8, 2018

/assign @karan @medinatiger

@k4leung4
Copy link
Contributor Author

k4leung4 commented Jun 8, 2018

/uncc @kris-nova @timothysc

@k4leung4
Copy link
Contributor Author

k4leung4 commented Jun 8, 2018

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2018
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 8, 2018
…ects

By waiting for the machine objects to be either created or deleted
before leaving Reconcile loop will prevent the race condition of stale
cache not reporting correctly the change to machine objects.
@k4leung4
Copy link
Contributor Author

/assign @mkjelland @karan @medinatiger


// stateConfirmationInterval is the amount of time between polling for the desired state.
// The polling is against a local memory cache.
var stateConfirmationInterval = 100 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too fast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is against an local internal cache, it shouldnt be an issue.
i can raise the interval if it is a concern

if errors.IsNotFound(err) {
return false, nil
}
return false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log all errors explicitly for debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error from the retry loop goes to pollErr which is always logged.

@karan
Copy link
Contributor

karan commented Jun 11, 2018

lgtm

@@ -342,3 +332,41 @@ func getMachinesToDelete(filteredMachines []*v1alpha1.Machine, diff int) []*v1al
// see: https://github.com/kubernetes/kube-deploy/issues/625
return filteredMachines[:diff]
}

func (c *MachineSetControllerImpl) waitForMachineCreation(machineList []*v1alpha1.Machine) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForMachineCreation and Deletion probably have some common code with deployer/clusterctl. Maybe have a separate PR in the future to abstract out to put it into util.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.
will clean up in future PR.

@medinatiger
Copy link
Contributor

lgtm

@karan
Copy link
Contributor

karan commented Jun 11, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2018
k4leung4 added a commit to k4leung4/cluster-api that referenced this pull request Jun 11, 2018
@k4leung4
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2018
@k8s-ci-robot k8s-ci-robot merged commit 367b383 into kubernetes-sigs:master Jun 11, 2018
k8s-ci-robot pushed a commit that referenced this pull request Jun 11, 2018
jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020
jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020
jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020
…mplatize-k8s-version

clusterctl: templatize kubernetes version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Find better solution for solving race conditions with MachineSet controller reconciling
5 participants