-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass for machine deployment controller #143
First pass for machine deployment controller #143
Conversation
Hi @k4leung4. Thanks for your PR. I'm waiting for a kubernetes or kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
547e422
to
c3206b8
Compare
/assign @krousey |
bdbcf1d
to
8ee2058
Compare
/assign @maisem |
@krousey: GitHub didn't allow me to assign the following users: maisem. Note that only kubernetes-sigs members and repo collaborators can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted a review, though I am mostly just learning how this works. Added some questions and suggestions for readability as a result.
This looks awesome and we're really looking forward to trying it out. I'd love to test it with our code but we weren't quite ready yet, however we will try to do so ASAP.
Thanks!
return nil, err | ||
} | ||
|
||
// TODO: flush out machine set adoption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own curiosity is MachineSet adoption a common use case? Curious how this would typically be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pod deployments, replicasets, the use case is that the user created a replicaset by hand, and wanted to upgrade it, and so they would create a pod deployment with matching labels, so it could adopt it and then upgrade using deployment strategies.
For machine deployments, it would be the equivalent of someone creating a machineset by hand, and wanting to upgrade it using machine deployment strategies.
} | ||
|
||
// updateMachineSet figures out what deployment(s) manage a MachineSet when the MachineSet | ||
// is updated and wake them up. If the anything of the MachineSets have changed, we need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple suggestions for godoc clarity, would this be accurate?
"If anything on the MachineSet has changed we need to reconcile it's current MachineDeployment. If the MachineSet's controller reference has changed, we must also reconcile the it's old MachineDeployment."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct.
updated godoc
} | ||
|
||
// FindNewMachineSet returns the new MS this given deployment targets (the one with the same machine template). | ||
func FindNewMachineSet(deployment *v1alpha1.MachineDeployment, msList []*v1alpha1.MachineSet) *v1alpha1.MachineSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "FindCurrentMachineSet" be a better name for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if "current" is really better. IMHO, it all comes down to terminology.
"new" refers to the new machine set that we are upgrading to, and "old" refers to the old machine set that we are upgrading from.
In a sense, "current" runs into the same issue that both the machineset that we are upgrading from and upgrading to are current.
I am all for better naming, as I too had difficulty following the deployment controller code, where this code is based off of. Though I am not sure if renaming it to "current" helps much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds ok, new is very pervasive and consistent here so this sounds ok to me now.
// In rare cases, such as after cluster upgrades, Deployment may end up with | ||
// having more than one new MachineSets that have the same template as its template, | ||
// see https://github.com/kubernetes/kubernetes/issues/40415 | ||
// We deterministically choose the oldest new MachineSet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest "oldest MachineSet with matching template hash."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for i := range msList { | ||
if EqualIgnoreHash(&msList[i].Spec.Template, &deployment.Spec.Template) { | ||
// In rare cases, such as after cluster upgrades, Deployment may end up with | ||
// having more than one new MachineSets that have the same template as its template, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest dropping "as its template".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// having more than one new MachineSets that have the same template as its template, | ||
// see https://github.com/kubernetes/kubernetes/issues/40415 | ||
// We deterministically choose the oldest new MachineSet. | ||
return msList[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify why it's correct to choose the oldest matching MachineSet? I'm not familiar with the specifics around how we end up with two (how they differ), but it strikes me in that situation you could want the newest MachineSet with matching template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is to say choosing the oldest matching MachineSet is always the correct choice. I too believe that there are scenarios where it is correct to choose the newest one.
The choice to choose the oldest one here is for it to be deterministic, instead of a random behavior that is dependent on the ordering of the list.
) | ||
|
||
// syncStatusOnly only updates Deployments Status and doesn't take any mutating actions. | ||
func (dc *MachineDeploymentControllerImpl) syncStatusOnly(d *v1alpha1.MachineDeployment, msList []*v1alpha1.MachineSet, machineMap map[types.UID]*v1alpha1.MachineList) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible unused function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct, removing unused code.
} | ||
|
||
// FindOldMachineSets returns the old machine sets targeted by the given Deployment, with the given slice of MSes. | ||
// Note that the first set of old machine sets doesn't include the ones with no machines, and the second set of old machine sets include all old machine sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest rewording that this returns a list of all old machine sets with replicas scaled to 0, and a list includes all old machine sets regardless of replicas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
// rolling deployment. | ||
if dutil.IsRollingUpdate(deployment) { | ||
allMSs := dutil.FilterActiveMachineSets(append(oldMSs, newMS)) | ||
allMSsReplicas := dutil.GetReplicaCountForMachineSets(allMSs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a rename to "totalMSReplicas".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// calculateStatus calculates the latest status for the provided deployment by looking into the provided machine sets. | ||
func calculateStatus(allMSs []*v1alpha1.MachineSet, newMS *v1alpha1.MachineSet, deployment *v1alpha1.MachineDeployment) v1alpha1.MachineDeploymentStatus { | ||
availableReplicas := dutil.GetAvailableReplicaCountForMachineSets(allMSs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR mentioned MachineSet Status not implemented yet, is this still accurate and the code is just prepared for when AvailableReplicas is properly set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still accurate and you are correct.
The MachineSet Status has the fields for AvailableReplicas, but the MachineSet controller currently does not populate it.
The method pulls value out of the MachineSet Status, but since it isn't populated, it always returns 0, which is the default.
a161b75
to
4ff2e87
Compare
return | ||
} | ||
glog.V(4).Infof("MachineSet %s added for deployment %v.", ms.Name, d.Name) | ||
c.Reconcile(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling Reconcile directly, you should queue the MachineDeployment. Otherwise, you could have Reconcile called simultaneously on the same MachineDeployment. (Same applies to all calls to Reconcile from event handlers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC queuing is handled in the generated code.
https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machinedeployment/zz_generated.api.register.go#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, this is for the case when you receive a MachineSet event and not for a MachineDeployment event. Maybe we should use AdditionalInformers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get access to the queue in your Init
function with:
queue := arguments.GetSharedInformers().WorkerQueues["MachineDeployment"].Queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Cesar, i found the same thing.
I updated the PR to add to the queue rather than calling Reconcile directly.
4ff2e87
to
0b69744
Compare
/assign @justinsb |
/cc @justinsb |
|
||
var filteredMS []*v1alpha1.MachineSet | ||
for _, ms := range msList { | ||
if metav1.GetControllerOf(ms) != nil && !metav1.IsControlledBy(ms, d) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a MS doesn't have a controller, we want to ignore it, I think? So if metav1.GetController(ms) == nil || ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, was originally thinking of handling it when I added the adoption code, but it makes sense to ignore it for now.
fixed.
} | ||
selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector) | ||
if err != nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this happen? if so maybe log here to save our future selves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not ever happen, as the validation for the object should have ensured a valid label, but logging and error makes sense.
added.
} | ||
// If a deployment with a nil or empty selector creeps in, it should match nothing, not everything. | ||
if selector.Empty() || !selector.Matches(labels.Set(ms.Labels)) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a warning in the empty case also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added better logging whenever a machineset is skipped.
if reflect.DeepEqual(d.Spec.Selector, &everything) { | ||
if d.Status.ObservedGeneration < d.Generation { | ||
d.Status.ObservedGeneration = d.Generation | ||
c.machineClient.ClusterV1alpha1().MachineDeployments(d.Namespace).UpdateStatus(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ignoring an error here? Maybe log if so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return on err, added logging.
} | ||
|
||
// Otherwise, it's an orphan. If anything changed, sync matching controllers | ||
// to see if anyone wants to adopt it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means that if we remove the owner from the machineset we'll likely just readopt it, but I think that's OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is what we would want in the adoption case.
// We can't look up by UID, so look up by Name and then verify UID. | ||
// Don't even try to look up by Name if it's the wrong Kind. | ||
if controllerRef.Kind != controllerKind.Kind { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like some glog.Warningf in the 3 return nil cases would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, added.
{ | ||
name: "scenario 1: 10 desired, oldMS at 10, 10 ready, 0 max unavailable => oldMS at 9, scale down by 1.", | ||
deploymentReplicas: 10, | ||
maxUnavailable: intstr.FromInt(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 0 unavailable & 0 max surge? I thought that was not allowed because we're stuck...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is an invalid configuration. this would normally have been safe guarded at a higher level.
fixed test.
readyMachines: 10, | ||
oldReplicas: 10, | ||
scaleExpected: true, | ||
expectedOldReplicas: 9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we violate maxUnavailable=0 as well here (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated test.
return dc.machineClient.ClusterV1alpha1().MachineSets(msCopy.ObjectMeta.Namespace).Update(msCopy) | ||
} | ||
|
||
// Should use the revision in existingNewMS's annotation, since it set by before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the comment.
we attempt to copy over the revision annotation over to the deployment if it doesn't already have it.
} | ||
|
||
needsUpdate := dutil.SetDeploymentRevision(d, newRevision) | ||
if !alreadyExists && d.Spec.ProgressDeadlineSeconds != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we're checking ProgressDeadlineSeconds here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason, removed.
case 1: | ||
return allMSs[0] | ||
default: | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this behaviour intended? It doesn't really match the comment I think - allMSs[0] is the latest I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming was off. It really wanted one active or latest. If there are multiple machine sets that are available, it needs to start scaling down some of the older machine sets first to avoid having too many active machine sets.
renamed the method and updated the comment.
0b69744
to
69d3de3
Compare
/assign @justinsb |
/unassign @krousey |
/hold |
7f4027e
to
c461beb
Compare
Copies/borrows heavily from deployment controller. Things not implemented in this PR: deployment progress orphan adoption
c461beb
to
7b1f291
Compare
/lgtm |
/hold cancel |
…roller This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work.
…roller This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work.
…roller This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work. Update apiserver for vsphere as that is what is being used for gce and terraform.
…roller (kubernetes-sigs#260) This is to pick up PR kubernetes-sigs#143, which allows for the creation of machine deployments. Rolling update should now work. Update apiserver for vsphere as that is what is being used for gce and terraform.
with the patch, clusterctl will been compiled, packaged into container and uploaded to gcr, then run as initContainer along with bootstrap_job container, bootstrap_job container will share a volume with initContainer so that clusterctl can be called in bootstrap_job container.
This was first added here kubernetes-sigs#143 and it's never been used.
This was first added here kubernetes-sigs#143 and it's never been used.
This was first added here kubernetes-sigs#143 and it's never been used.
This was first added here kubernetes-sigs#143 and it's never been used.
This was first added here kubernetes-sigs#143 and it's never been used.
This was first added here kubernetes-sigs#143 and it's never been used.
This was first added here kubernetes-sigs#143 and it's never been used.
What this PR does / why we need it:
This is the first pass for the machine deployment controller.
Copies/borrows heavily from deployment controller.
Things not implemented in this PR:
Caveat
@kubernetes/kube-deploy-reviewers