Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Label interruptible nodes #3668

Merged
merged 1 commit into from
Dec 1, 2020
Merged

✨ Label interruptible nodes #3668

merged 1 commit into from
Dec 1, 2020

Conversation

alexander-demicev
Copy link
Contributor

What this PR does / why we need it:

Transfer machine labels to the node. The use-case is described here #3504.

The transfer of machine labels to the node is similar to setting Noderef. Important thing to mention is that only labels with "cluster.x-k8s.io" prefix are transferred.

This PR should help with implementing the termination handler for spot instances - #3528

Happy to hear what others think about this change.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3504

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 21, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @alexander-demichev. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 21, 2020
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 21, 2020
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, couple of small comments but I have nothing major to add

@@ -0,0 +1,79 @@
/*
Copyright 2019 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copyright 2019 The Kubernetes Authors.
Copyright 2020 The Kubernetes Authors.


for key, value := range machineLabels {
// sync only labels with "cluster.x-k8s.io" prefix, so users can understand where labels come from
if strings.HasPrefix(key, "cluster.x-k8s.io") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a constant that can be used for this somewhere already?

@enxebre
Copy link
Member

enxebre commented Sep 21, 2020

is this intending to supersede kubernetes-sigs/cluster-api-provider-aws#1876?
Overall seems reasonable to me.

@ncdc
Copy link
Contributor

ncdc commented Sep 21, 2020

We have had previous requests about setting labels on nodes - see #458 and #493. The earlier consensus was that Cluster API will not reconcile node labels, but the kubeadm bootstrapper can set the initial values for a node's labels using KubeadmConfig.spec.{initConfiguration,joinConfiguration}.nodeRegistration.kubeletExtraArgs["node-labels"]. Continuing maintenance of a node's labels is better left to some other actor.

I do think it could be useful to adjust Machine.spec to add a field such as initialNodeLabels and then update the contract for bootstrap providers to use the values in that field to set the node's initial labels.

We will need to have further discussion on this before we can proceed, given the previous decision not to do this.

/hold

cc @kubernetes-sigs/cluster-api-maintainers

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 21, 2020
@alexander-demicev
Copy link
Contributor Author

@ncdc Will list of labels be additive if we use kubeadm boostraper?

@ncdc
Copy link
Contributor

ncdc commented Sep 21, 2020

@alexander-demichev they're the initial set of labels. See https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ (search for --node-labels) for more information.

@enxebre
Copy link
Member

enxebre commented Sep 21, 2020

@ncdc @alexander-demichev is there a way to intrinsically infer a given machine is for preemptible so the node can be labeled based on this as an implementation detail rather than generically syncing labels?
E.g Couldn't this be done by each specific provider implementation when they see their preemptible API fields set?

@ncdc
Copy link
Contributor

ncdc commented Sep 21, 2020

@enxebre ah ok now I see what the desired behavior is here around labels & preemptible/spot instances. If we're looking for a behavior that is consistent across infrastructure providers, I'd say it's most appropriate to implement that behavior in core Cluster API, or at the very least, define the contract portion in the core.

A behavior such as "if a Machine is preemptible, make sure its Node has the label cluster.x-k8s.io/interruptible-instance" could be implemented

  1. as one of the initial node labels set by the bootstrapper when registering the node
  2. via reconciliation in the core Cluster API machine controller
  3. via something outside of / on top of Cluster API

Option 1 (bootstrapper) requires every Cluster API bootstrap provider to support setting a node's initial labels. This seems reasonable to me. One downside is that if the interruptible label were somehow removed from the Node, nothing would add it back.

Option 2 (reconciliation) should probably only support syncing a very specific set of labels to a Node, namely this interruptible label for the time being. I wouldn't want to open this up to user-supplied labels, for the reasons previously stated in the other issues I linked above. But I do think it makes sense to sync labels defined by Cluster API that influence the behavior of the overall system. One downside to this option is that there would be a short period of time during which the interruptible label would be absent from the new Node. One upside is that the interruptible label would be re-added to the Node if it were somehow removed.

Option 3 (some external entity) I mentioned for completeness, but I don't think it's appropriate because Cluster API should be or is knowledgable that this is an interruptible machine.

For this specific use case -- ensuring a specific interruptible label is present on a Node -- I think it's appropriate to do both 1 and 2 together. 1 is already available in the kubeadm bootstrap provider and may be available in others. I do think a future API improvement could be what I mentioned in a previous comment - adding an initialNodeLabels spec field to Machine and having bootstrap providers use that. Option 2 would be new, and restricted solely to syncing this new interruptible label.

I also wonder if it would make sense to consider adding another new spec field to Machine - interruptible (name TBD) - to indicate the user understands this could be deleted by the cloud provider. This would simplify the burden on the user: they'd set spec.interruptible, and the machine reconciler would take care of setting the appropriate interruptible label (in other words, the user wouldn't have to spell out the full key name for the label and hope they don't have a typo). Adding a new interruptible field to the Machine spec has an added benefit for infrastructure providers - they could validate the corresponding InfraMachine (AWSMachine/AzureMachine/GCPMachine/etc) is configured with the interruptible field(s) appropriately. This idea could be implemented later, however - it's not required for the label syncing to function as desired.

@enxebre
Copy link
Member

enxebre commented Sep 21, 2020

This all make sense to me @ncdc. As a user to define an interruptible machine you just need to set it in one place today, the infra template. I think we should aim to keep it that way so there's no room for partial supported scenarios. To that end I wonder if it'd make sense to let interruptible to be in the status rather than the spec (so we own it, not the user) and let option 2 to reconcile the node label based on this field being populated.

@ncdc
Copy link
Contributor

ncdc commented Sep 21, 2020

I wonder if it'd make sense to let interruptible to be in the status rather than the spec (so we own it, not the user) and let option 2 to reconcile the node label based on this field being populated.

I like this idea. As I was writing my spec.interruptible suggestion, I was wondering if it would be an unnecessary extra step. Brainstorming on flow:

  1. User creates InfraMachine with whatever spec field(s) are required for that provider to indicate it's interruptible
  2. Infra provider sets InfraMachine.status.interruptible=true
  3. Machine controller syncs InfraMachine.status.interruptible to Machine.status.interruptible
  4. (optional) Bootstrap provider somehow waits for the above sync so it can use Machine.status.interruptible when creating the bootstrap data secret (this needs more thinking)
  5. Machine controller ensures the interruptible label is always present on the Node if Machine.status.interruptible is true

@CecileRobertMichon
Copy link
Contributor

If we do move forward with this approach, can we also make sure that the same can be applied to MachinePools? I think the approach @ncdc describes above should work at first glance but I want to make sure we explicitly include it in the design (and even maybe implementation if it's straight forward).

@vincepri
Copy link
Member

I'd hold off on adding more status syncing between infrastructure provider and the machine controller. From a high level perspective, Cluster API itself doesn't know that a Machine is preemptible.

The approach described above could potentially cause a dependency loop between the controllers. One alternative approach, which should be discussed in a proposal, is to add a condition to the InfrastructureMachine that communicates "ready to receive bootstrap data". Bootstrap providers would have to wait for this condition to appear before generating the bootstrap configuration and continue execution.

This would be a breaking change, which definitely requires some more thinking / design proposal. I'd suggest to open an RFE issue and gather use cases and proposals.

How does that sound?

/milestone v0.4.0

@enxebre
Copy link
Member

enxebre commented Sep 22, 2020

One alternative approach, which should be discussed in a proposal, is to add a condition to the InfrastructureMachine that communicates "ready to receive bootstrap data"...

I see this as something orthogonal and just a nice to have for the interruptible instances story. To cover the interruptible instances story end to end as a first iteration I see no concerns to proceed as with 1, 2, 3 and 5.
In parallel we can have a separate wider discussion for 4 and other use cases. wdyt?

@vincepri
Copy link
Member

In parallel we can have a separate wider discussion for 4 and other use cases. wdyt?

If the only goal is to update node labels after the Node is up, I think that could be doable. That said, adding a new synced status field goes somewhat against our different goal to move completely to conditions, and remove the current use of boolean fields.

@vincepri
Copy link
Member

Is this something you'd like to do in v0.3.x, or can we wait until v0.4.0 and have an RFE / small proposal in an issue?

@ncdc
Copy link
Contributor

ncdc commented Sep 22, 2020

I'd like to revise my previous suggestion and remove the part about adding interruptible to machine.status and syncing it from the infra machine to the machine. We can still achieve what we need by having the machine controller look at the infra machine's status (for interruptible), and avoid syncing that information to the machine.

remove the current use of boolean fields.

Have we stated as a goal to never use any more booleans in status? Something like status.interruptible=true/false feels different than a condition, as it's data that's derived from the spec and isn't something that will ever change for the life of the infra machine. It seems ok to have a status boolean for this?

re timing, I propose we do this once we open main for new features & breaking changes for v0.4.0. Do you want a CAEP for this (contract change for infra providers to add status.interruptible (name/location TBD), machine controller updated to sync just this one label to interruptible nodes), or can we repurpose #3504 for the details?

@alexander-demicev
Copy link
Contributor Author

@ncdc I think changing the description of #3504 and giving more context should be fine.

@alexander-demicev
Copy link
Contributor Author

@ncdc @vincepri PTAL #3504 and let's move the discussion there.

@alexander-demicev
Copy link
Contributor Author

/test pull-cluster-api-test-main

@k8s-ci-robot
Copy link
Contributor

@alexander-demichev: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-cluster-api-test-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alexander-demicev
Copy link
Contributor Author

@ncdc Hi, I tried to address all comments, PTAL

Comment on lines 36 to 37
// Check that the Machine hasn't been deleted or in the process.
if !machine.DeletionTimestamp.IsZero() {
return ctrl.Result{}, nil
}

// Check that the Machine has a NodeRef.
if machine.Status.NodeRef == nil {
return ctrl.Result{}, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge these two together?

Comment on lines 67 to 68
err = r.setInterruptibleNodeLabel(ctx, remoteClient, machine.Status.NodeRef.Name)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = r.setInterruptibleNodeLabel(ctx, remoteClient, machine.Status.NodeRef.Name)
if err != nil {
if err := r.setInterruptibleNodeLabel(ctx, remoteClient, machine.Status.NodeRef.Name); err != nil {

if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to get interruptible status from infrastructure provider for Machine %q in namespace %q: %w", machine.Name, machine.Namespace, err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

return ctrl.Result{}, err
}

logger.Info("Set interruptible label to Machine's Node", "nodename", machine.Status.NodeRef.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Info("Set interruptible label to Machine's Node", "nodename", machine.Status.NodeRef.Name)
logger.V(3).Info("Set interruptible label to Machine's Node", "nodename", machine.Status.NodeRef.Name)

Comment on lines 80 to 82

err := remoteClient.Get(ctx, client.ObjectKey{Name: nodeName}, node)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := remoteClient.Get(ctx, client.ObjectKey{Name: nodeName}, node)
if err != nil {
if err := remoteClient.Get(ctx, client.ObjectKey{Name: nodeName}, node); err != nil {

Comment on lines 97 to 94
if err := patchHelper.Patch(ctx, node); err != nil {
return err
}

return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := patchHelper.Patch(ctx, node); err != nil {
return err
}
return nil
}
return patchHelper.Patch(ctx, node)
}

controllers/machine_controller_node_labels.go Show resolved Hide resolved
// Get interruptible instance status from the infrastructure provider.
interruptible, _, err := unstructured.NestedBool(infra.Object, "status", "interruptible")
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to get interruptible status from infrastructure provider for Machine %q in namespace %q: %w", machine.Name, machine.Namespace, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri what do you think about logging here and returning early instead of returning an error? If we return an error, it means we will start exponential backoff retries. I assume that unless the infra machine changes, continuing to retry the same object will yield the same failure each time. And, given we are already watching for infra machine changes, the machine will get requeued whenever the infra machine is updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely can. That said, these errors are uncommon code paths where we shouldn't get into unless something really wrong happened. Most of the time, functions like NestedBool are going to return regardless if the value exists or not, and in this case we use the default value for the boolean if the json path (status.interruptible) doesn't exists. The only way for this call to fail is if the infra.Object isn't well formed, which is unlikely in most cases, and it would probably hint to data corruption

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, and if it's malformed data, exponential backoff isn't going to be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thank you.

)

func (r *MachineReconciler) reconcileInterruptibleNodeLabel(ctx context.Context, cluster *clusterv1.Cluster, machine *clusterv1.Machine) (ctrl.Result, error) {
logger := ctrl.LoggerFrom(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider moving this down to immediately before you first use the logger

return err
}

patchHelper, err := patch.NewHelper(node, r.Client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this down so we only instantiate if we know we need to patch?

controllers/machine_controller_node_labels.go Show resolved Hide resolved
// Check if node gets interruptible label
g.Eventually(func() bool {
updatedNode := &corev1.Node{}
err := testEnv.Get(ctx, client.ObjectKey{Name: node.Name, Namespace: ns.Name}, updatedNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := testEnv.Get(ctx, client.ObjectKey{Name: node.Name, Namespace: ns.Name}, updatedNode)
err := testEnv.Get(ctx, client.ObjectKey{Name: node.Name}, updatedNode)

@vincepri
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 16, 2020
g.Expect(testEnv.Cleanup(ctx, do...)).To(Succeed())
}(cluster, node, infraMachine, machine)

g.Expect(clusterv1.AddToScheme(scheme.Scheme)).To(Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line isn't necessary because it's already done in

// Calculate the scheme.
utilruntime.Must(apiextensionsv1.AddToScheme(scheme.Scheme))
utilruntime.Must(clusterv1.AddToScheme(scheme.Scheme))
utilruntime.Must(bootstrapv1.AddToScheme(scheme.Scheme))
utilruntime.Must(expv1.AddToScheme(scheme.Scheme))
utilruntime.Must(crs.AddToScheme(scheme.Scheme))
utilruntime.Must(addonv1.AddToScheme(scheme.Scheme))
utilruntime.Must(kcpv1.AddToScheme(scheme.Scheme))
utilruntime.Must(admissionv1.AddToScheme(scheme.Scheme))
.

@vincepri
Copy link
Member

vincepri commented Nov 20, 2020

Can we rename this PR and change the description to better capture that it only supports interruptible label?

@ncdc
Copy link
Contributor

ncdc commented Nov 20, 2020

/retitle ✨ Label interruptible nodes
/lgtm
/hold cancel
/assign @vincepri

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2020
@k8s-ci-robot k8s-ci-robot changed the title ✨ Transfer machine labels to node ✨ Label interruptible nodes Nov 20, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 20, 2020
@ncdc
Copy link
Contributor

ncdc commented Nov 20, 2020

/retest

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise LGTM

return ctrl.Result{}, err
}

logger := ctrl.LoggerFrom(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger := ctrl.LoggerFrom(ctx)
log := ctrl.LoggerFrom(ctx)

To be consistent with the rest of the codebase


func (r *MachineReconciler) setInterruptibleNodeLabel(ctx context.Context, remoteClient client.Client, nodeName string) error {
node := &apicorev1.Node{}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2020
@alexander-demicev
Copy link
Contributor Author

@ncdc @vincepri All fixed, thanks for reviews!

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2020
@alexander-demicev
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set interruptible label to nodes
7 participants