First pass for machine deployment controller #143

k4leung4 · 2018-05-08T22:26:25Z

What this PR does / why we need it:
This is the first pass for the machine deployment controller.

Copies/borrows heavily from deployment controller.

Things not implemented in this PR:

deployment progress
orphan adoption

Caveat

rolling update does not work due to machineset status not implemented

@kubernetes/kube-deploy-reviewers

k8s-ci-robot · 2018-05-08T22:26:38Z

Hi @k4leung4. Thanks for your PR.

I'm waiting for a kubernetes or kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

roberthbailey · 2018-05-09T17:25:54Z

/ok-to-test

k4leung4 · 2018-05-10T15:04:20Z

/assign @krousey

krousey · 2018-05-15T17:59:50Z

/assign @maisem

k8s-ci-robot · 2018-05-15T17:59:51Z

@krousey: GitHub didn't allow me to assign the following users: maisem.

Note that only kubernetes-sigs members and repo collaborators can be assigned.

In response to this:

/assign @maisem

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dgoodwin

Attempted a review, though I am mostly just learning how this works. Added some questions and suggestions for readability as a result.

This looks awesome and we're really looking forward to trying it out. I'd love to test it with our code but we weren't quite ready yet, however we will try to do so ASAP.

Thanks!

dgoodwin · 2018-05-16T10:57:03Z

pkg/controller/machinedeployment/controller.go

+		return nil, err
+	}
+
+	// TODO: flush out machine set adoption.


For my own curiosity is MachineSet adoption a common use case? Curious how this would typically be used.

For pod deployments, replicasets, the use case is that the user created a replicaset by hand, and wanted to upgrade it, and so they would create a pod deployment with matching labels, so it could adopt it and then upgrade using deployment strategies.

For machine deployments, it would be the equivalent of someone creating a machineset by hand, and wanting to upgrade it using machine deployment strategies.

dgoodwin · 2018-05-16T11:14:24Z

pkg/controller/machinedeployment/controller.go

+}
+
+// updateMachineSet figures out what deployment(s) manage a MachineSet when the MachineSet
+// is updated and wake them up. If the anything of the MachineSets have changed, we need to


Couple suggestions for godoc clarity, would this be accurate?

"If anything on the MachineSet has changed we need to reconcile it's current MachineDeployment. If the MachineSet's controller reference has changed, we must also reconcile the it's old MachineDeployment."

That is correct.
updated godoc

dgoodwin · 2018-05-16T11:48:44Z

pkg/controller/machinedeployment/util/util.go

+}
+
+// FindNewMachineSet returns the new MS this given deployment targets (the one with the same machine template).
+func FindNewMachineSet(deployment *v1alpha1.MachineDeployment, msList []*v1alpha1.MachineSet) *v1alpha1.MachineSet {


Would "FindCurrentMachineSet" be a better name for this function?

Not sure if "current" is really better. IMHO, it all comes down to terminology.
"new" refers to the new machine set that we are upgrading to, and "old" refers to the old machine set that we are upgrading from.
In a sense, "current" runs into the same issue that both the machineset that we are upgrading from and upgrading to are current.
I am all for better naming, as I too had difficulty following the deployment controller code, where this code is based off of. Though I am not sure if renaming it to "current" helps much.

That sounds ok, new is very pervasive and consistent here so this sounds ok to me now.

dgoodwin · 2018-05-16T11:50:44Z

pkg/controller/machinedeployment/util/util.go

+			// In rare cases, such as after cluster upgrades, Deployment may end up with
+			// having more than one new MachineSets that have the same template as its template,
+			// see https://github.com/kubernetes/kubernetes/issues/40415
+			// We deterministically choose the oldest new MachineSet.


Suggest "oldest MachineSet with matching template hash."

dgoodwin · 2018-05-16T11:51:17Z

pkg/controller/machinedeployment/util/util.go

+	for i := range msList {
+		if EqualIgnoreHash(&msList[i].Spec.Template, &deployment.Spec.Template) {
+			// In rare cases, such as after cluster upgrades, Deployment may end up with
+			// having more than one new MachineSets that have the same template as its template,


Suggest dropping "as its template".

dgoodwin · 2018-05-16T11:53:09Z

pkg/controller/machinedeployment/util/util.go

+			// having more than one new MachineSets that have the same template as its template,
+			// see https://github.com/kubernetes/kubernetes/issues/40415
+			// We deterministically choose the oldest new MachineSet.
+			return msList[i]


Could you clarify why it's correct to choose the oldest matching MachineSet? I'm not familiar with the specifics around how we end up with two (how they differ), but it strikes me in that situation you could want the newest MachineSet with matching template.

I don't think it is to say choosing the oldest matching MachineSet is always the correct choice. I too believe that there are scenarios where it is correct to choose the newest one.
The choice to choose the oldest one here is for it to be deterministic, instead of a random behavior that is dependent on the ordering of the list.

dgoodwin · 2018-05-16T12:07:50Z

pkg/controller/machinedeployment/sync.go

+)
+
+// syncStatusOnly only updates Deployments Status and doesn't take any mutating actions.
+func (dc *MachineDeploymentControllerImpl) syncStatusOnly(d *v1alpha1.MachineDeployment, msList []*v1alpha1.MachineSet, machineMap map[types.UID]*v1alpha1.MachineList) error {


Possible unused function here.

you are correct, removing unused code.

dgoodwin · 2018-05-16T12:18:23Z

pkg/controller/machinedeployment/util/util.go

+}
+
+// FindOldMachineSets returns the old machine sets targeted by the given Deployment, with the given slice of MSes.
+// Note that the first set of old machine sets doesn't include the ones with no machines, and the second set of old machine sets include all old machine sets.


Suggest rewording that this returns a list of all old machine sets with replicas scaled to 0, and a list includes all old machine sets regardless of replicas.

dgoodwin · 2018-05-16T12:40:15Z

pkg/controller/machinedeployment/sync.go

+	// rolling deployment.
+	if dutil.IsRollingUpdate(deployment) {
+		allMSs := dutil.FilterActiveMachineSets(append(oldMSs, newMS))
+		allMSsReplicas := dutil.GetReplicaCountForMachineSets(allMSs)


Suggest a rename to "totalMSReplicas".

dgoodwin · 2018-05-16T12:52:10Z

pkg/controller/machinedeployment/sync.go

+
+// calculateStatus calculates the latest status for the provided deployment by looking into the provided machine sets.
+func calculateStatus(allMSs []*v1alpha1.MachineSet, newMS *v1alpha1.MachineSet, deployment *v1alpha1.MachineDeployment) v1alpha1.MachineDeploymentStatus {
+	availableReplicas := dutil.GetAvailableReplicaCountForMachineSets(allMSs)


PR mentioned MachineSet Status not implemented yet, is this still accurate and the code is just prepared for when AvailableReplicas is properly set?

This is still accurate and you are correct.
The MachineSet Status has the fields for AvailableReplicas, but the MachineSet controller currently does not populate it.
The method pulls value out of the MachineSet Status, but since it isn't populated, it always returns 0, which is the default.

csrwng · 2018-05-16T20:28:43Z

pkg/controller/machinedeployment/controller.go

+			return
+		}
+		glog.V(4).Infof("MachineSet %s added for deployment %v.", ms.Name, d.Name)
+		c.Reconcile(d)


Instead of calling Reconcile directly, you should queue the MachineDeployment. Otherwise, you could have Reconcile called simultaneously on the same MachineDeployment. (Same applies to all calls to Reconcile from event handlers)

IIUC queuing is handled in the generated code.
https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machinedeployment/zz_generated.api.register.go#L33

nvm, this is for the case when you receive a MachineSet event and not for a MachineDeployment event. Maybe we should use AdditionalInformers.

You can get access to the queue in your Init function with:

queue := arguments.GetSharedInformers().WorkerQueues["MachineDeployment"].Queue

thanks Cesar, i found the same thing.
I updated the PR to add to the queue rather than calling Reconcile directly.

k4leung4 · 2018-05-21T22:27:52Z

/assign @justinsb

k4leung4 · 2018-05-21T22:28:04Z

/cc @justinsb

justinsb · 2018-05-21T22:34:36Z

pkg/controller/machinedeployment/controller.go

+
+	var filteredMS []*v1alpha1.MachineSet
+	for _, ms := range msList {
+		if metav1.GetControllerOf(ms) != nil && !metav1.IsControlledBy(ms, d) {


If a MS doesn't have a controller, we want to ignore it, I think? So if metav1.GetController(ms) == nil || ...

nice catch, was originally thinking of handling it when I added the adoption code, but it makes sense to ignore it for now.
fixed.

justinsb · 2018-05-21T22:35:48Z

pkg/controller/machinedeployment/controller.go

+		}
+		selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)
+		if err != nil {
+			continue


Does this happen? if so maybe log here to save our future selves?

This should not ever happen, as the validation for the object should have ensured a valid label, but logging and error makes sense.
added.

justinsb · 2018-05-21T22:36:04Z

pkg/controller/machinedeployment/controller.go

+		}
+		// If a deployment with a nil or empty selector creeps in, it should match nothing, not everything.
+		if selector.Empty() || !selector.Matches(labels.Set(ms.Labels)) {
+			continue


Maybe a warning in the empty case also?

added better logging whenever a machineset is skipped.

justinsb · 2018-05-21T22:37:41Z

pkg/controller/machinedeployment/controller.go

+	if reflect.DeepEqual(d.Spec.Selector, &everything) {
+		if d.Status.ObservedGeneration < d.Generation {
+			d.Status.ObservedGeneration = d.Generation
+			c.machineClient.ClusterV1alpha1().MachineDeployments(d.Namespace).UpdateStatus(d)


Are we ignoring an error here? Maybe log if so...

Return on err, added logging.

justinsb · 2018-05-21T22:44:51Z

pkg/controller/machinedeployment/controller.go

+	}
+
+	// Otherwise, it's an orphan. If anything changed, sync matching controllers
+	// to see if anyone wants to adopt it now.


I think this means that if we remove the owner from the machineset we'll likely just readopt it, but I think that's OK

I think that is what we would want in the adoption case.

justinsb · 2018-05-21T22:46:00Z

pkg/controller/machinedeployment/controller.go

+	// We can't look up by UID, so look up by Name and then verify UID.
+	// Don't even try to look up by Name if it's the wrong Kind.
+	if controllerRef.Kind != controllerKind.Kind {
+		return nil


I feel like some glog.Warningf in the 3 return nil cases would be helpful

agree, added.

justinsb · 2018-05-22T02:41:02Z

pkg/controller/machinedeployment/rolling_test.go

+		{
+			name:                   "scenario 1: 10 desired, oldMS at 10, 10 ready, 0 max unavailable => oldMS at 9, scale down by 1.",
+			deploymentReplicas:     10,
+			maxUnavailable:         intstr.FromInt(0),


Is this 0 unavailable & 0 max surge? I thought that was not allowed because we're stuck...

yes, that is an invalid configuration. this would normally have been safe guarded at a higher level.
fixed test.

justinsb · 2018-05-22T02:44:24Z

pkg/controller/machinedeployment/rolling_test.go

+			readyMachines:       10,
+			oldReplicas:         10,
+			scaleExpected:       true,
+			expectedOldReplicas: 9,


I think we violate maxUnavailable=0 as well here (?)

updated test.

justinsb · 2018-05-22T03:16:00Z

pkg/controller/machinedeployment/sync.go

+			return dc.machineClient.ClusterV1alpha1().MachineSets(msCopy.ObjectMeta.Namespace).Update(msCopy)
+		}
+
+		// Should use the revision in existingNewMS's annotation, since it set by before


I don't understand this comment

updated the comment.

we attempt to copy over the revision annotation over to the deployment if it doesn't already have it.

justinsb · 2018-05-22T03:44:30Z

pkg/controller/machinedeployment/sync.go

+	}
+
+	needsUpdate := dutil.SetDeploymentRevision(d, newRevision)
+	if !alreadyExists && d.Spec.ProgressDeadlineSeconds != nil {


Not sure why we're checking ProgressDeadlineSeconds here?

no reason, removed.

justinsb · 2018-05-22T06:28:14Z

pkg/controller/machinedeployment/util/util.go

+	case 1:
+		return allMSs[0]
+	default:
+		return nil


Is this behaviour intended? It doesn't really match the comment I think - allMSs[0] is the latest I think

The naming was off. It really wanted one active or latest. If there are multiple machine sets that are available, it needs to start scaling down some of the older machine sets first to avoid having too many active machine sets.

renamed the method and updated the comment.

k4leung4 · 2018-05-22T20:27:09Z

/assign @justinsb

k4leung4 · 2018-05-29T15:07:27Z

/unassign @krousey

k4leung4 · 2018-05-29T22:18:41Z

/hold

Copies/borrows heavily from deployment controller. Things not implemented in this PR: deployment progress orphan adoption

maisem · 2018-05-30T17:43:10Z

/lgtm

k4leung4 · 2018-05-31T01:58:45Z

/hold cancel

…roller This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work.

…roller This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work. Update apiserver for vsphere as that is what is being used for gce and terraform.

…roller (#260) This is to pick up PR #143, which allows for the creation of machine deployments. Rolling update should now work. Update apiserver for vsphere as that is what is being used for gce and terraform.

…roller (kubernetes-sigs#260) This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work. Update apiserver for vsphere as that is what is being used for gce and terraform.

with the patch, clusterctl will been compiled, packaged into container and uploaded to gcr, then run as initContainer along with bootstrap_job container, bootstrap_job container will share a volume with initContainer so that clusterctl can be called in bootstrap_job container.

This was first added here kubernetes-sigs#143 and it's never been used.

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 8, 2018

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 9, 2018

k4leung4 force-pushed the machinedeployment branch 5 times, most recently from 547e422 to c3206b8 Compare May 9, 2018 19:50

k8s-ci-robot assigned krousey May 10, 2018

k4leung4 force-pushed the machinedeployment branch 2 times, most recently from bdbcf1d to 8ee2058 Compare May 15, 2018 17:15

dgoodwin reviewed May 16, 2018

View reviewed changes

k4leung4 force-pushed the machinedeployment branch 2 times, most recently from a161b75 to 4ff2e87 Compare May 16, 2018 16:55

csrwng reviewed May 16, 2018

View reviewed changes

k4leung4 force-pushed the machinedeployment branch from 4ff2e87 to 0b69744 Compare May 17, 2018 15:22

k8s-ci-robot assigned justinsb May 21, 2018

k8s-ci-robot requested a review from justinsb May 21, 2018 22:28

justinsb reviewed May 21, 2018

View reviewed changes

justinsb reviewed May 22, 2018

View reviewed changes

k4leung4 force-pushed the machinedeployment branch from 0b69744 to 69d3de3 Compare May 22, 2018 17:55

k8s-ci-robot assigned maisem May 29, 2018

k8s-ci-robot unassigned krousey May 29, 2018

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2018

k4leung4 force-pushed the machinedeployment branch 2 times, most recently from 7f4027e to c461beb Compare May 30, 2018 01:25

First pass for machine deployment controller

7b1f291

Copies/borrows heavily from deployment controller. Things not implemented in this PR: deployment progress orphan adoption

k4leung4 force-pushed the machinedeployment branch from c461beb to 7b1f291 Compare May 30, 2018 15:51

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2018

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 31, 2018

k8s-ci-robot merged commit e929a2f into kubernetes-sigs:master May 31, 2018

k4leung4 mentioned this pull request May 31, 2018

Rev the controller manager version to pick up machine deployment controller #260

Merged

enxebre added a commit to enxebre/cluster-api that referenced this pull request Sep 15, 2021

Remove unused MachineDeployment constants

b104496

This was first added here kubernetes-sigs#143 and it's never been used.

enxebre mentioned this pull request Sep 15, 2021

🌱 Deprecate unused MachineDeployment constants #5241

Merged

enxebre added a commit to enxebre/cluster-api that referenced this pull request Sep 15, 2021

Deprecate unused MachineDeployment constants

2ae8f06

This was first added here kubernetes-sigs#143 and it's never been used.

enxebre added a commit to enxebre/cluster-api that referenced this pull request Sep 15, 2021

Deprecate unused MachineDeployment constants

6d52909

This was first added here kubernetes-sigs#143 and it's never been used.

enxebre added a commit to enxebre/cluster-api that referenced this pull request Sep 15, 2021

Deprecate unused MachineDeployment constants

ca7e71b

This was first added here kubernetes-sigs#143 and it's never been used.

sbueringer pushed a commit to sbueringer/cluster-api that referenced this pull request Sep 20, 2021

Deprecate unused MachineDeployment constants

0e0f549

This was first added here kubernetes-sigs#143 and it's never been used.

sbueringer pushed a commit to sbueringer/cluster-api that referenced this pull request Sep 21, 2021

Deprecate unused MachineDeployment constants

c524df1

This was first added here kubernetes-sigs#143 and it's never been used.

Jont828 pushed a commit to Jont828/cluster-api that referenced this pull request Oct 26, 2021

Deprecate unused MachineDeployment constants

1cc6c27

This was first added here kubernetes-sigs#143 and it's never been used.

enxebre mentioned this pull request Jun 8, 2023

Setting md.spec.paused to true should fully pause the MachineDeployment #8629

Open

First pass for machine deployment controller #143

First pass for machine deployment controller #143

Conversation

k4leung4 commented May 8, 2018 • edited Loading

k8s-ci-robot commented May 8, 2018

roberthbailey commented May 9, 2018

k4leung4 commented May 10, 2018

krousey commented May 15, 2018

k8s-ci-robot commented May 15, 2018

dgoodwin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k4leung4 commented May 21, 2018

k4leung4 commented May 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k4leung4 commented May 22, 2018

k4leung4 commented May 29, 2018

k4leung4 commented May 29, 2018

maisem commented May 30, 2018

k4leung4 commented May 31, 2018

k4leung4 commented May 8, 2018 •

edited

Loading