Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Clusterctl alpha rollout undo for MachineDeployments #4098

Merged

Conversation

Arvinderpal
Copy link
Contributor

What this PR does / why we need it:
Adds command and client for the clusterctl alpha rollout undo command.

Example Usage: Rollback to earlier K version after upgrade.

  1. Do an K version upgrade so that a new MS is created:
kubectl get md -n default test-md-0 -o json | jq '.spec.template.spec.version="v1.19.3"' | kubectl apply -f-
  1. You should see a new MS and an old MS. MD will have revision annotation to point to new MS.
  2. Perform undo:
clusterctl alpha rollout undo machinedeployment/test-md-0 --to-revision=XXX

Another example is to rollback to an earlier MachineTemplate. See CAPI book on how to change MachineTemplates.

Tracking Issue:
#3439

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 21, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @Arvinderpal. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 21, 2021
@Arvinderpal
Copy link
Contributor Author

@wfernandes Here it is. 😄

@Arvinderpal Arvinderpal mentioned this pull request Jan 21, 2021
9 tasks
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is building on top of #4054

@Arvinderpal
Copy link
Contributor Author

Seems this is building on top of #4054

Yes, it is. @wfernandes suggested it to get the review process started. Once #4054 merges, we can rebase.

Copy link
Contributor

@wfernandes wfernandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had the chance to actually use these features yet. But this is a first pass review. Thanks for putting in the work.

cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/rollout_test.go Outdated Show resolved Hide resolved
@@ -62,7 +139,7 @@ func (c *clusterctlClient) RolloutRestart(options RolloutRestartOptions) error {
}

for _, t := range tuples {
if err := c.alphaClient.Rollout().ObjectRestarter(clusterClient.Proxy(), t, options.Namespace); err != nil {
if err := c.alphaClient.Rollout().ObjectRollbacker(clusterClient.Proxy(), t, options.Namespace, options.ToRevision); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are calling this behavior "Rollback" and not "Undo" within the Rollout client? Just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is that kubectl itself has this naming convention -- the command is undo but underlying implementation refers to rollback. I kept it consistent with kubectl, even though it's annoying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying @Arvinderpal!


selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)
if err != nil {
log.V(5).Info("Skipping MachineSet, failed to get label selector from spec selector", "machineset", ms.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is getting the label selector from the deployment, I'm assuming that the error is meant to reflect just that.

Suggested change
log.V(5).Info("Skipping MachineSet, failed to get label selector from spec selector", "machineset", ms.Name)
log.V(5).Info("Skipping MachineSet, failed to get label selector from MachineDeployment.Spec.Selector", "MachineDeployment", d.Name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, can this and the following selector.Empty() checks be moved outside the loop? The MachineDeployment is fixed, so should we iterate over the MachineSet items only if we have a non-empty selector.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually leveraged existing code here:

selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)

But I see your point, I don't see why the selector should be in the loop. I'll change it unless you see any reason otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason TBH. Maybe someone else may know why.
@vincepri Any thoughts?

@Arvinderpal Arvinderpal force-pushed the clusterctl-alpha-rollout-rollback branch from f3a2143 to 2a25c68 Compare January 24, 2021 02:11
@wfernandes
Copy link
Contributor

@Arvinderpal Since #4054 has been merged we can rebase this and remove WIP. 🙂

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2021
@Arvinderpal Arvinderpal force-pushed the clusterctl-alpha-rollout-rollback branch from 2a25c68 to db1abdc Compare February 4, 2021 21:57
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 4, 2021
@Arvinderpal Arvinderpal changed the title WIP: ✨ Clusterctl alpha rollout undo for MachineDeployments ✨ Clusterctl alpha rollout undo for MachineDeployments Feb 4, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2021
@Arvinderpal
Copy link
Contributor Author

@Arvinderpal Since #4054 has been merged we can rebase this and remove WIP.

@wfernandes Done. PTAL.

@Arvinderpal Arvinderpal force-pushed the clusterctl-alpha-rollout-rollback branch 2 times, most recently from 00d4d4b to e99d62e Compare February 4, 2021 22:08
@wfernandes
Copy link
Contributor

@Arvinderpal Thanks! I'll take a look tomorrow asap 🙂

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2021
Copy link
Contributor

@wfernandes wfernandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't able to manually test this because of yak-shaving issues. 😄
But apart from the MachineDeployment labelselector being within the MachineSet loop, it lgtm

Comment on lines +116 to +117
if toRevision > 0 {
return nil, errors.Errorf("unable to find specified revision: %v", toRevision)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is added to provide a better error msg because I see that the next check would also result in returning an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the specific revision desired was not found. It's different from the next error which is that there is no previous MS associated with this deployment.

@Arvinderpal
Copy link
Contributor Author

Wasn't able to manually test this because of yak-shaving issues.
But apart from the MachineDeployment labelselector being within the MachineSet loop, it lgtm

@wfernandes Thanks for the final review!
@vincepri Do you have time to review this or delegate to someone else?

@wfernandes
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 11, 2021
@vincepri
Copy link
Member

vincepri commented Mar 2, 2021

/assign @fabriziopandini

for final review

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, I still have to take a look at the tests

// ObjectRollbacker will issue a rollback on the specified cluster-api resource.
func (r *rollout) ObjectRollbacker(proxy cluster.Proxy, tuple util.ResourceTuple, namespace string, toRevision int64) error {
switch tuple.Resource {
case machineDeployment:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code base in some places we are using machineDeployment (const), in some others "machinedeployment" (string).
What about getting rid of lowercase version of Kind and use Kind instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, just followed kubectl conventions with the long-term goal to put this functionality in kubectl.
I should be using the const everywhere including tests. I will do that.

cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
for idx := range machineSets.Items {
ms := &machineSets.Items[idx]

selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move out of the for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion with Warren below. :)

cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
cmd := &cobra.Command{
Use: "undo RESOURCE",
DisableFlagsInUseLine: true,
Short: "Undo a cluster-api resource",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show we be a little bit more specific given that we are supporting only machine deployments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this to the cleanup PR since all commands will need updating.

@@ -30,6 +30,7 @@ type Rollout interface {
ObjectRestarter(cluster.Proxy, util.ResourceTuple, string) error
ObjectPauser(cluster.Proxy, util.ResourceTuple, string) error
ObjectResumer(cluster.Proxy, util.ResourceTuple, string) error
ObjectRollbacker(cluster.Proxy, util.ResourceTuple, string, int64) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required to use int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the underlying mdutil.Revision() func returns a int64.

@vincepri
Copy link
Member

vincepri commented Mar 3, 2021

/milestone v0.4.0

@Arvinderpal Please rebase as well, given the changes for Go 1.16 that merged yesterday

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Mar 3, 2021
Copy link
Contributor Author

@Arvinderpal Arvinderpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini Thank you for the feedback. PTAL again.

@@ -30,6 +30,7 @@ type Rollout interface {
ObjectRestarter(cluster.Proxy, util.ResourceTuple, string) error
ObjectPauser(cluster.Proxy, util.ResourceTuple, string) error
ObjectResumer(cluster.Proxy, util.ResourceTuple, string) error
ObjectRollbacker(cluster.Proxy, util.ResourceTuple, string, int64) error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the underlying mdutil.Revision() func returns a int64.

// ObjectRollbacker will issue a rollback on the specified cluster-api resource.
func (r *rollout) ObjectRollbacker(proxy cluster.Proxy, tuple util.ResourceTuple, namespace string, toRevision int64) error {
switch tuple.Resource {
case machineDeployment:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, just followed kubectl conventions with the long-term goal to put this functionality in kubectl.
I should be using the const everywhere including tests. I will do that.

cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/alpha/rollout_rollbacker.go Outdated Show resolved Hide resolved
cmd := &cobra.Command{
Use: "undo RESOURCE",
DisableFlagsInUseLine: true,
Short: "Undo a cluster-api resource",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this to the cleanup PR since all commands will need updating.

for idx := range machineSets.Items {
ms := &machineSets.Items[idx]

selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion with Warren below. :)

@Arvinderpal Arvinderpal force-pushed the clusterctl-alpha-rollout-rollback branch 3 times, most recently from 3f347d2 to 9dbfb4f Compare March 6, 2021 22:54
@Arvinderpal Arvinderpal force-pushed the clusterctl-alpha-rollout-rollback branch from 9dbfb4f to 3cd6fa1 Compare March 6, 2021 23:07
@Arvinderpal Arvinderpal mentioned this pull request Mar 8, 2021
3 tasks
@Arvinderpal
Copy link
Contributor Author

@fabriziopandini Thanks again for the follow up. If it's okay with you, we can merge this and address the remaining issues in: #4266

See tracking issue as well: #3439

@fabriziopandini
Copy link
Member

/lgtm
considering some cleanup/refactor is going to land in follow up PRs.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
@Arvinderpal
Copy link
Contributor Author

@vincepri @fabriziopandini Considering there are two LGTMs, can we approve and merge this? Thank you.

@vincepri
Copy link
Member

@Arvinderpal Could we open an issue for the follow-up PRs?

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2021
@Arvinderpal
Copy link
Contributor Author

Arvinderpal commented Mar 12, 2021

@Arvinderpal Could we open an issue for the follow-up PRs?

@vincepri #4266

@Arvinderpal
Copy link
Contributor Author

/retest

1 similar comment
@Arvinderpal
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 12, 2021

@Arvinderpal: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-cluster-api-apidiff-main 3cd6fa1 link /test pull-cluster-api-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit abba4c9 into kubernetes-sigs:master Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants