Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Compute ControlPlane version for a managed topology #5059

Closed

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Aug 9, 2021

Signed-off-by: Stefan Büringer [email protected]

What this PR does / why we need it:

I intentionally did not implement the more complicated cases when a previous rollout is not completely done and the next upgrade is triggered. The idea behind this is to get the base case stable and tested (including e2e) before we expand the implementation to more advanced use cases.

Notes:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Partially implements #5016

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 9, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign fabriziopandini after the PR has been reviewed.
You can assign the PR to them by writing /assign @fabriziopandini in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 9, 2021
@sbueringer sbueringer force-pushed the pr-upgrade-comp-by-comp branch from 5a8000d to b19e4c5 Compare August 9, 2021 09:13
@sbueringer
Copy link
Member Author

/retest
(the usual flakes)

1 similar comment
@sbueringer
Copy link
Member Author

/retest
(the usual flakes)

Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just a couple of questions

controllers/clustertopology_controller_compute.go Outdated Show resolved Hide resolved
controllers/clustertopology_controller_compute.go Outdated Show resolved Hide resolved
controllers/clustertopology_controller_compute.go Outdated Show resolved Hide resolved
@sbueringer sbueringer force-pushed the pr-upgrade-comp-by-comp branch from b19e4c5 to c8caf3d Compare August 10, 2021 06:48
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 10, 2021
@sbueringer sbueringer force-pushed the pr-upgrade-comp-by-comp branch from c8caf3d to 81e57c6 Compare August 11, 2021 12:14
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2021
@sbueringer
Copy link
Member Author

Rebased on top of current main.

@sbueringer
Copy link
Member Author

/retest

@sbueringer
Copy link
Member Author

/area topology

return currentTopologyVersion, nil
}

currentControlPlaneVersion, err := getControlPlaneVersion(current.controlPlane.object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
currentControlPlaneVersion, err := getControlPlaneVersion(current.controlPlane.object)
version, err := getControlPlaneVersion(current.controlPlane.object)

Copy link
Member Author

@sbueringer sbueringer Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the current prefix everywhere. Now I have:

  • topologyVersion
  • controlPlaneVersion
  • controlPlaneStatusVersion

I think in this func the prefixes are useful to keep the logic as easily readable as possible.

If you prefer the following seems like a good alternative to me:

  • topologyVersion
  • version
  • statusVersion

I guess the missing prefix in combination with the func name should provide enough context to infer version is the controlPlaneVersion.

WDYT?

controllers/topology/desired_state.go Outdated Show resolved Hide resolved
controllers/topology/desired_state.go Outdated Show resolved Hide resolved
controllers/topology/desired_state.go Outdated Show resolved Hide resolved
Comment on lines 172 to 175
// ControlPlane already has the currentTopologyVersion.
if currentControlPlaneVersion == currentTopologyVersion {
return currentTopologyVersion, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return is the exact same as the last one, can we simplify and merge them?

Copy link
Member Author

@sbueringer sbueringer Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. I thought it might be easier to read and understand if I spell out the different cases explicitly (especially as there will be a few more).

I combined those two and added a comment which contains both cases.

return nil, errors.Wrap(err, "failed to set spec.version in the ControlPlane object")
}

return controlPlane, nil
}

// computeControlPlaneVersion computes the ControlPlane version based on the current clusterTopologyState.
// TODO: we also have to handle the following cases:
// * ControlPlane.spec.version != ControlPlane.status.version, i.e. ControlPLane rollout is already in progress.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably handle this now within this function, is there any reason we're holding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @fabriziopandini and we thought it might be good idea to keep it easy for now as there are all kinds of things that can go wrong when the ClusterClass version is changed again during a rollout (either when ControlPlane is not yet rolled out completely and also if at least a subset of MachineDeployments are not rolled out yet).

Only adding the case in l.156 should be relatively simple.

Copy link
Member Author

@sbueringer sbueringer Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the case from l.156.

If we also want to add l.157 I would prefer waiting for the MachineDeployment compute and reconcile PRs to be merged first. Then I would also implement computeMachineDeploymentVersions in this PR. This should make it easier to reason about the overall cluster state during upgrades and all edge cases.

Comment on lines +179 to +184
// Do not trigger another rollout, while a rollout is already in progress. A rollout is still in progress if:
// * .status.version is not set
// * .status.version is set but not equal to .spec.version
if !statusVersionSet || controlPlaneVersion != controlPlaneStatusVersion {
return controlPlaneVersion, nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously based on the assumption that we don't want to trigger another rollout while one is in progress.
I'm not sure if that's what we want.

@sbueringer
Copy link
Member Author

sbueringer commented Aug 19, 2021

/hold for now. No further review required, will be reopened with a slightly different scope.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 19, 2021
@k8s-ci-robot
Copy link
Contributor

@sbueringer: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

Superseded by #5178

@sbueringer sbueringer closed this Aug 31, 2021
@killianmuldoon killianmuldoon added the area/clusterclass Issues or PRs related to clusterclass label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants