✨ Add rolloutAfter for RollingUpdate strategy #4596

enxebre · 2021-05-11T13:24:13Z

What this PR does / why we need it:
If the reconciliation time is after spec.rolloutAfter then a rollout should happen or has already happened.
A new MachineSet is needed the first time the reconciliation time is after spec.rolloutAfter.
Otherwise we pick the oldest with creation timestamp > lastRolloutAfter.
If a new MachineSet is needed we include the rolloutAfter hash into the MachineSet name so when a new MachineSet is created the name does not clash with the one for the existing MachineSet with the same template and the rollout can be orchestrated as usual.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4536

enxebre · 2021-05-11T13:24:58Z

cc @vincepri @JoelSpeed @sbueringer @CecileRobertMichon @fabriziopandini

vincepri · 2021-05-11T14:32:04Z

@enxebre Do you think we could tackle rolloutOlderThan separately?

enxebre · 2021-05-11T14:38:11Z

/retitle WIP: Add rolloutAfter for RollingUpdate strategy

sbueringer

I could be totally wrong with most of these comments. I don't have a lot of experience with the up and downscale logic.

controllers/machinedeployment_controller.go

enxebre · 2021-05-14T10:40:12Z

I could be totally wrong with most of these comments. I don't have a lot of experience with the up and downscale logic.

Thanks @sbueringer this is still kinda pseudo code, I wanted to hear other's feedback to see if the algorithm is making sense.
I'll give it another pass to get it to a mergeable state.

enxebre · 2021-09-09T10:17:17Z

Back to this. We can probably simplify by using a label/annotation approach similar to what kubectl rollout restart does. I'll give it a go.

enxebre · 2021-09-09T11:07:26Z

@sbueringer @vincepri what do you think? if we want to go this path I'll get the PR to a mergeable state.

vincepri · 2021-09-09T14:07:18Z

api/v1alpha4/machinedeployment_types.go

+	// RolledoutAfterAnnotation annotates the MachineTemplateSpec to trigger a
+	// MachineDeployment rollout when the RolloutAfter criteria is met.
+	RolledoutAfterAnnotation = "machinedeployment.clusters.x-k8s.io/RolledoutAfter"


Instead of relying on a label, could we add additional logic like we have in KCP to take into account the Spec.RolloutAfter field?

MachineDeployment.Spec.RolloutAfter is the source of truth and the annotation is an impl detail https://github.com/kubernetes-sigs/cluster-api/pull/4596/files#diff-34f64847a48c9fc1b3222c9bec6c93f30767a93262215eb4305d78c3b9097c63R225.

Orchestrate this "manually" as in KCP does not seems optimal here since Machines are owned by MachineSets and subject to its reconciliation logic and to MachineDeployment replicas so leveraging the MachineDeployment ability to rollout might be the most viable path.

Not sure if you were suggesting something different?

The rollout is determined in this function https://github.com/kubernetes-sigs/cluster-api/blob/master/controllers/mdutil/util.go#L416-L431

Today we're only looking at the MachineTemplate, we could add some logic there to compare CreationTimestamp with RolloutAfter?

I though about that but I couldn't think of a working way in that direction so far because mdutil.FindNewMachineSet(d, msList) prefers the oldest MachineSet in case of equal templates., see my reasoning here #4596 (comment)

The issue with the above is that mdutil.FindNewMachineSet(d, msList) seems to prefer oldest MachineSet in case of equal templates.

From the text in there, the assumption made is that usually there should be only one MachineSet with the same template, otherwise we wouldn't need to roll out change; there is an edge case where there could be more than one MachineSet with the same template, and the code always picks the oldest one (which might already be scaled).

The above shouldn't impact adding new logic that looks at the creation timestamp and compares with a rolloutAfter field?

Wouldn't we also check the CreationTimestamp now at every iteration though? If there is a RolloutAfter, the MachineSet CreationTimestamp must be after the RolloutAfter's value. Even if templates are the same, we shouldn't pick the oldest one in that case.

Yes we could do that but then we'll be changing FindNewMachineSet to always pick the newest even in the duplicated template edge case. We have no means to differentiate between the duplicated template edge use case and the rolloutAfter use case, so for both we'll be picking the newest one. If we don't care about the edge use case, then it might work. I tried to summarise that reasoning here #4596 (comment).

Adding an annotation or label to a MachineDeployment to rollout a template is truthfully something that might change in the future. I'd rather find a better way to rollout changes without using any behaviors, but rather be explicit about it when we need to calculate changes. In the future we might want to add more logic to determine if a MachineSet needs to be deployed, and relying on annotations is probably not going to work out long term.

I'm approaching the rolling upgrade logic here as a working blackbox building block. We can do any calculation we want outside it to decide we want a rolling upgrade, the annotation is just what we signal it from a level above. I'm seeing the decision of making a rolling upgrade and the rollout logic at two different levels. Though I can also see your angle.

Yes we could do that but then we'll be changing FindNewMachineSet to always pick the newest even in the duplicated template edge case.

The duplicate one is truthfully an edge case that should be audited, but that's probably something for a different issue/PR. The code is the same as the ReplicaSet code upstream, which probably has this issue because api-server and controller-manager could be restarting and duplicating ReplicaSet(s). I don't know if this can happen for MachineSets.

To further clarify though, we wouldn't be changing the code to pick the newest one, but the oldest one with these conditions:

ms.metadata.creationTimestamp > md.spec.rolloutAfter

EqualMachineTemplate(md, ms) == true

We have no means to differentiate between the duplicated template edge use case and the rolloutAfter use case, so for both we'll be picking the newest one.

This shouldn't apply in case of RolloutAfter. If a rollout time is in effect, even if we have duplicates, we've already rolled out a new MachineSet at that point.

If there are duplicates (depending on whatever RevisionHistory is set to), cleanupDeployment function takes care of deleting outdated MachineSets.

The edge case issue was because different kube versions resulted in different hashes for ReplicaSet names resulting in duplicated ReplicaSets.

To further clarify though, we wouldn't be changing the code to pick the newest one, but the oldest one with these conditions:
ms.metadata.creationTimestamp > md.spec.rolloutAfter
EqualMachineTemplate(md, ms) == true

Ok this makes sense let me implement it and keep discussing through code.

@vincepri @sbueringer updated PTAL.

Updated to test and support the following possible cases:

RolloutAfter == nil

Pick oldest

RolloutAfter < Now

Oldest

Created > last RolloutAfter

RolloutAfter > Now

Oldest

Created > last RolloutAfter

To keep track of the last rolloutAfter I'm using an annotation LastRolloutAfterAnnotation. We could use a status field but I'd rather wait for the implementation to get mature and see how/if it combines with things like maxAge.

@vincepri @sbueringer PTAL.

controllers/machinedeployment_controller.go

enxebre · 2021-09-22T17:44:28Z

Let's wait for #5297 to rebase and proceed with this

k8s-ci-robot · 2021-09-28T14:36:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign timothysc after the PR has been reviewed.
You can assign the PR to them by writing /assign @timothysc in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enxebre · 2021-09-28T14:38:43Z

@vincepri @sbueringer I rebased this. Given the timelines and the lack of feedback/testing in real use I think we should probably punt on it until we release v1beta1.

vincepri · 2021-09-28T16:28:19Z

controllers/internal/mdutil/util_test.go

+			Name:       "When rolloutAfter is not required but another one took place in the past it should return oldest created after the last rolloutAfter",
+			deployment: deploymentRolloutAfterHappened,
+			msList:     []*clusterv1.MachineSet{&machineSetBeforeLastRolloutAfter, &machineSetAfterLastRolloutAfter, &oldestMachineSetAfterLastRolloutAfter},
+			expected:   &oldestMachineSetAfterLastRolloutAfter,
+		},


I'm a bit confused when would this unit test case would happen. If I understood the code correctly, this use case adds the requirement for the annotation. Wouldn't the machineSetBeforeLastRolloutAfter be gone though after the rollout takes place?

It won't be necessarily gone, it might be scaled down to zero right? That depends only on .spec.RevisionHistoryLimit right?

If the reconciliation time is after spec.rolloutAfter then a rollout should happen or has already happened. A new MachineSet is needed the first time the reconciliation time is after spec.rolloutAfter. Otherwise we pick the oldest with creation timestamp > lastRolloutAfter. If a new MachineSet is needed we include the rolloutAfter hash into the MachineSet name so when a new MachineSet is created the name does not clash with the one for the existing MachineSet with the same template and the rollout can be orchestrated as usual.

vincepri · 2021-10-06T13:31:33Z

api/v1beta1/machinedeployment_types.go

+	// LastRolloutAfterAnnotation is the last MachineDeployment.Spec.RolloutAfter that was met.
+	LastRolloutAfterAnnotation = "machinedeployment.clusters.x-k8s.io/lastRollout"


Is the annotation applied and stored only on MachineSets in this case?

vincepri · 2021-10-06T13:32:00Z

controllers/internal/mdutil/util.go

+// Deprecated: this does not for rolloutAfter. Use FindNewMachineSetWithRolloutAfter.
 func FindNewMachineSet(deployment *clusterv1.MachineDeployment, msList []*clusterv1.MachineSet) *clusterv1.MachineSet {


We can probably change these functions signature now instead?

wouldn't that be a breaking change? don't we need to target this to v1beta2 now since this is an API change?

Not 100% sure, but I think the idea is that we can have code breaking changes between v1.0 and v1.1, i.e. only the API types and their behavior must not have breaking changes.

Or from another point of view: v1beta1 guards our API types and corresponding implementation, the release version the code (+ we're doing breaking changes there on minor versions as we're not incrementing major, similar to Kubernetes)

make sense, though I'm not sure a new field can be added with out creating a subsequent API version.

I think the idea was to allow new fields if they are non-breaking, but let's see if somebody else knows :)

Internal packages are not exposed APIs, so we can break even within patch releases given that only internal code might use them

I'm not sure but I was assuming we started talking at some point about the addition of the RolloutAfter field to the v1beta1 API type.

k8s-ci-robot · 2021-10-25T22:37:44Z

@enxebre: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

randomvariable · 2021-11-18T10:23:48Z

Are we still trying to get this one in?

k8s-ci-robot · 2021-12-08T13:54:27Z

@enxebre: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-e2e-main	`98600a6`	link	true	`/test pull-cluster-api-e2e-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-triage-robot · 2022-03-08T14:01:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vincepri · 2022-03-31T17:03:25Z

/close

Closing due to age, @enxebre feel free to reopen after rebase if this is still a change we want to pursue

k8s-ci-robot · 2022-03-31T17:03:37Z

@vincepri: Closed this PR.

In response to this:

/close

Closing due to age, @enxebre feel free to reopen after rebase if this is still a change we want to pursue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 11, 2021

k8s-ci-robot requested review from fabriziopandini and vincepri May 11, 2021 13:24

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 11, 2021

enxebre changed the title ~~WIP: Add rolloutAfter for RollingUpdate strategy~~ WIP: Add rolloutAfter/rolloutOlderThan for RollingUpdate strategy May 11, 2021

k8s-ci-robot changed the title ~~WIP: Add rolloutAfter/rolloutOlderThan for RollingUpdate strategy~~ WIP: Add rolloutAfter for RollingUpdate strategy May 11, 2021

sbueringer reviewed May 13, 2021

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2021

enxebre mentioned this pull request Sep 9, 2021

ClusterClass: deprecate and remove rolloutAfter #5218

Closed

enxebre force-pushed the rollout-after branch from fe925d4 to d2072ce Compare September 9, 2021 10:51

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2021

enxebre force-pushed the rollout-after branch from d2072ce to 7182629 Compare September 9, 2021 10:52

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 9, 2021

enxebre force-pushed the rollout-after branch 3 times, most recently from 750c9cd to c5d806c Compare September 9, 2021 11:21

vincepri reviewed Sep 9, 2021

View reviewed changes

sbueringer reviewed Sep 9, 2021

View reviewed changes

controllers/machinedeployment_controller.go Outdated Show resolved Hide resolved

enxebre force-pushed the rollout-after branch from c5d806c to c6edd12 Compare September 13, 2021 10:37

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 13, 2021

enxebre force-pushed the rollout-after branch from c6edd12 to 761be61 Compare September 13, 2021 10:39

enxebre force-pushed the rollout-after branch from fa8000f to 425e0bb Compare September 20, 2021 08:53

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2021

enxebre mentioned this pull request Sep 22, 2021

⚠️ Move controller/mdutil to an internal package #5250

Closed

enxebre force-pushed the rollout-after branch from 425e0bb to 6208c03 Compare September 22, 2021 16:49

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2021

enxebre force-pushed the rollout-after branch from 6208c03 to 8ed5286 Compare September 22, 2021 17:03

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 23, 2021

enxebre force-pushed the rollout-after branch from 8ed5286 to de831b2 Compare September 28, 2021 14:36

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 28, 2021

enxebre force-pushed the rollout-after branch 2 times, most recently from 5d0f71e to 9a2964d Compare September 28, 2021 15:16

vincepri reviewed Sep 28, 2021

View reviewed changes

enxebre force-pushed the rollout-after branch from 9a2964d to 98600a6 Compare September 29, 2021 09:16

vincepri requested changes Oct 6, 2021

View reviewed changes

sbueringer mentioned this pull request Oct 19, 2021

🌱 Added ObjectMeta to DockerMachineTemplateResource, DockerClusterTemplateResource #5441

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 25, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2022

shysank mentioned this pull request Mar 10, 2022

Improve usability for rotating cloud provider credentials kubernetes-sigs/cluster-api-provider-azure#2096

Closed

k8s-ci-robot closed this Mar 31, 2022

chrischdi mentioned this pull request Aug 11, 2022

✨ Implement MachineDeployment rolloutAfter support #7053

Closed

		// LastRolloutAfterAnnotation is the last MachineDeployment.Spec.RolloutAfter that was met.
		LastRolloutAfterAnnotation = "machinedeployment.clusters.x-k8s.io/lastRollout"

		// Deprecated: this does not for rolloutAfter. Use FindNewMachineSetWithRolloutAfter.
		func FindNewMachineSet(deployment clusterv1.MachineDeployment, msList []clusterv1.MachineSet) *clusterv1.MachineSet {

✨ Add rolloutAfter for RollingUpdate strategy #4596

✨ Add rolloutAfter for RollingUpdate strategy #4596

Conversation

enxebre commented May 11, 2021 • edited Loading

enxebre commented May 11, 2021

vincepri commented May 11, 2021

enxebre commented May 11, 2021

sbueringer left a comment • edited Loading

Choose a reason for hiding this comment

enxebre commented May 14, 2021

enxebre commented Sep 9, 2021 • edited Loading

enxebre commented Sep 9, 2021

Choose a reason for hiding this comment

enxebre Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

vincepri Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

enxebre Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre commented Sep 22, 2021

k8s-ci-robot commented Sep 28, 2021

enxebre commented Sep 28, 2021 • edited Loading

Choose a reason for hiding this comment

enxebre Sep 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Oct 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Oct 14, 2021 • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 25, 2021

randomvariable commented Nov 18, 2021

k8s-ci-robot commented Dec 8, 2021

k8s-triage-robot commented Mar 8, 2022

vincepri commented Mar 31, 2022

k8s-ci-robot commented Mar 31, 2022

enxebre commented May 11, 2021 •

edited

Loading

sbueringer left a comment •

edited

Loading

enxebre commented Sep 9, 2021 •

edited

Loading

enxebre Sep 9, 2021 •

edited

Loading

enxebre Sep 10, 2021 •

edited

Loading

vincepri Sep 10, 2021 •

edited

Loading

enxebre Sep 10, 2021 •

edited

Loading

enxebre commented Sep 28, 2021 •

edited

Loading

enxebre Sep 29, 2021 •

edited

Loading

sbueringer Oct 12, 2021 •

edited

Loading

sbueringer Oct 14, 2021 •

edited

Loading