-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Label interruptible nodes #3668
Conversation
Hi @alexander-demichev. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, couple of small comments but I have nothing major to add
@@ -0,0 +1,79 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright 2019 The Kubernetes Authors. | |
Copyright 2020 The Kubernetes Authors. |
|
||
for key, value := range machineLabels { | ||
// sync only labels with "cluster.x-k8s.io" prefix, so users can understand where labels come from | ||
if strings.HasPrefix(key, "cluster.x-k8s.io") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a constant that can be used for this somewhere already?
is this intending to supersede kubernetes-sigs/cluster-api-provider-aws#1876? |
We have had previous requests about setting labels on nodes - see #458 and #493. The earlier consensus was that Cluster API will not reconcile node labels, but the kubeadm bootstrapper can set the initial values for a node's labels using I do think it could be useful to adjust We will need to have further discussion on this before we can proceed, given the previous decision not to do this. /hold cc @kubernetes-sigs/cluster-api-maintainers |
@ncdc Will list of labels be additive if we use kubeadm boostraper? |
@alexander-demichev they're the initial set of labels. See https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ (search for |
@ncdc @alexander-demichev is there a way to intrinsically infer a given machine is for preemptible so the node can be labeled based on this as an implementation detail rather than generically syncing labels? |
@enxebre ah ok now I see what the desired behavior is here around labels & preemptible/spot instances. If we're looking for a behavior that is consistent across infrastructure providers, I'd say it's most appropriate to implement that behavior in core Cluster API, or at the very least, define the contract portion in the core. A behavior such as "if a Machine is preemptible, make sure its Node has the label
Option 1 (bootstrapper) requires every Cluster API bootstrap provider to support setting a node's initial labels. This seems reasonable to me. One downside is that if the interruptible label were somehow removed from the Node, nothing would add it back. Option 2 (reconciliation) should probably only support syncing a very specific set of labels to a Node, namely this interruptible label for the time being. I wouldn't want to open this up to user-supplied labels, for the reasons previously stated in the other issues I linked above. But I do think it makes sense to sync labels defined by Cluster API that influence the behavior of the overall system. One downside to this option is that there would be a short period of time during which the interruptible label would be absent from the new Node. One upside is that the interruptible label would be re-added to the Node if it were somehow removed. Option 3 (some external entity) I mentioned for completeness, but I don't think it's appropriate because Cluster API should be or is knowledgable that this is an interruptible machine. For this specific use case -- ensuring a specific interruptible label is present on a Node -- I think it's appropriate to do both 1 and 2 together. 1 is already available in the kubeadm bootstrap provider and may be available in others. I do think a future API improvement could be what I mentioned in a previous comment - adding an I also wonder if it would make sense to consider adding another new spec field to Machine - |
This all make sense to me @ncdc. As a user to define an interruptible machine you just need to set it in one place today, the infra template. I think we should aim to keep it that way so there's no room for partial supported scenarios. To that end I wonder if it'd make sense to let |
I like this idea. As I was writing my spec.interruptible suggestion, I was wondering if it would be an unnecessary extra step. Brainstorming on flow:
|
If we do move forward with this approach, can we also make sure that the same can be applied to MachinePools? I think the approach @ncdc describes above should work at first glance but I want to make sure we explicitly include it in the design (and even maybe implementation if it's straight forward). |
I'd hold off on adding more status syncing between infrastructure provider and the machine controller. From a high level perspective, Cluster API itself doesn't know that a Machine is preemptible. The approach described above could potentially cause a dependency loop between the controllers. One alternative approach, which should be discussed in a proposal, is to add a condition to the InfrastructureMachine that communicates "ready to receive bootstrap data". Bootstrap providers would have to wait for this condition to appear before generating the bootstrap configuration and continue execution. This would be a breaking change, which definitely requires some more thinking / design proposal. I'd suggest to open an RFE issue and gather use cases and proposals. How does that sound? /milestone v0.4.0 |
I see this as something orthogonal and just a nice to have for the interruptible instances story. To cover the interruptible instances story end to end as a first iteration I see no concerns to proceed as with 1, 2, 3 and 5. |
If the only goal is to update node labels after the Node is up, I think that could be doable. That said, adding a new synced status field goes somewhat against our different goal to move completely to conditions, and remove the current use of boolean fields. |
Is this something you'd like to do in v0.3.x, or can we wait until v0.4.0 and have an RFE / small proposal in an issue? |
I'd like to revise my previous suggestion and remove the part about adding interruptible to machine.status and syncing it from the infra machine to the machine. We can still achieve what we need by having the machine controller look at the infra machine's status (for interruptible), and avoid syncing that information to the machine.
Have we stated as a goal to never use any more booleans in status? Something like status.interruptible=true/false feels different than a condition, as it's data that's derived from the spec and isn't something that will ever change for the life of the infra machine. It seems ok to have a status boolean for this? re timing, I propose we do this once we open main for new features & breaking changes for v0.4.0. Do you want a CAEP for this (contract change for infra providers to add status.interruptible (name/location TBD), machine controller updated to sync just this one label to interruptible nodes), or can we repurpose #3504 for the details? |
/test pull-cluster-api-test-main |
@alexander-demichev: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ncdc Hi, I tried to address all comments, PTAL |
// Check that the Machine hasn't been deleted or in the process. | ||
if !machine.DeletionTimestamp.IsZero() { | ||
return ctrl.Result{}, nil | ||
} | ||
|
||
// Check that the Machine has a NodeRef. | ||
if machine.Status.NodeRef == nil { | ||
return ctrl.Result{}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge these two together?
err = r.setInterruptibleNodeLabel(ctx, remoteClient, machine.Status.NodeRef.Name) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err = r.setInterruptibleNodeLabel(ctx, remoteClient, machine.Status.NodeRef.Name) | |
if err != nil { | |
if err := r.setInterruptibleNodeLabel(ctx, remoteClient, machine.Status.NodeRef.Name); err != nil { |
if err != nil { | ||
return ctrl.Result{}, fmt.Errorf("failed to get interruptible status from infrastructure provider for Machine %q in namespace %q: %w", machine.Name, machine.Namespace, err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ctrl.Result{}, err | ||
} | ||
|
||
logger.Info("Set interruptible label to Machine's Node", "nodename", machine.Status.NodeRef.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Info("Set interruptible label to Machine's Node", "nodename", machine.Status.NodeRef.Name) | |
logger.V(3).Info("Set interruptible label to Machine's Node", "nodename", machine.Status.NodeRef.Name) |
|
||
err := remoteClient.Get(ctx, client.ObjectKey{Name: nodeName}, node) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := remoteClient.Get(ctx, client.ObjectKey{Name: nodeName}, node) | |
if err != nil { | |
if err := remoteClient.Get(ctx, client.ObjectKey{Name: nodeName}, node); err != nil { |
if err := patchHelper.Patch(ctx, node); err != nil { | ||
return err | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := patchHelper.Patch(ctx, node); err != nil { | |
return err | |
} | |
return nil | |
} | |
return patchHelper.Patch(ctx, node) | |
} |
// Get interruptible instance status from the infrastructure provider. | ||
interruptible, _, err := unstructured.NestedBool(infra.Object, "status", "interruptible") | ||
if err != nil { | ||
return ctrl.Result{}, fmt.Errorf("failed to get interruptible status from infrastructure provider for Machine %q in namespace %q: %w", machine.Name, machine.Namespace, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vincepri what do you think about logging here and returning early instead of returning an error? If we return an error, it means we will start exponential backoff retries. I assume that unless the infra machine changes, continuing to retry the same object will yield the same failure each time. And, given we are already watching for infra machine changes, the machine will get requeued whenever the infra machine is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely can. That said, these errors are uncommon code paths where we shouldn't get into unless something really wrong happened. Most of the time, functions like NestedBool are going to return regardless if the value exists or not, and in this case we use the default value for the boolean if the json path (status.interruptible) doesn't exists. The only way for this call to fail is if the infra.Object
isn't well formed, which is unlikely in most cases, and it would probably hint to data corruption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, and if it's malformed data, exponential backoff isn't going to be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thank you.
) | ||
|
||
func (r *MachineReconciler) reconcileInterruptibleNodeLabel(ctx context.Context, cluster *clusterv1.Cluster, machine *clusterv1.Machine) (ctrl.Result, error) { | ||
logger := ctrl.LoggerFrom(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider moving this down to immediately before you first use the logger
return err | ||
} | ||
|
||
patchHelper, err := patch.NewHelper(node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this down so we only instantiate if we know we need to patch?
// Check if node gets interruptible label | ||
g.Eventually(func() bool { | ||
updatedNode := &corev1.Node{} | ||
err := testEnv.Get(ctx, client.ObjectKey{Name: node.Name, Namespace: ns.Name}, updatedNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := testEnv.Get(ctx, client.ObjectKey{Name: node.Name, Namespace: ns.Name}, updatedNode) | |
err := testEnv.Get(ctx, client.ObjectKey{Name: node.Name}, updatedNode) |
/ok-to-test |
g.Expect(testEnv.Cleanup(ctx, do...)).To(Succeed()) | ||
}(cluster, node, infraMachine, machine) | ||
|
||
g.Expect(clusterv1.AddToScheme(scheme.Scheme)).To(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this line isn't necessary because it's already done in
cluster-api/test/helpers/envtest.go
Lines 76 to 84 in ace4e24
// Calculate the scheme. | |
utilruntime.Must(apiextensionsv1.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(clusterv1.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(bootstrapv1.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(expv1.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(crs.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(addonv1.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(kcpv1.AddToScheme(scheme.Scheme)) | |
utilruntime.Must(admissionv1.AddToScheme(scheme.Scheme)) |
Can we rename this PR and change the description to better capture that it only supports interruptible label? |
/retitle ✨ Label interruptible nodes |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, otherwise LGTM
return ctrl.Result{}, err | ||
} | ||
|
||
logger := ctrl.LoggerFrom(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger := ctrl.LoggerFrom(ctx) | |
log := ctrl.LoggerFrom(ctx) |
To be consistent with the rest of the codebase
|
||
func (r *MachineReconciler) setInterruptibleNodeLabel(ctx context.Context, remoteClient client.Client, nodeName string) error { | ||
node := &apicorev1.Node{} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
What this PR does / why we need it:
Transfer machine labels to the node. The use-case is described here #3504.
The transfer of machine labels to the node is similar to setting
Noderef
. Important thing to mention is that only labels with"cluster.x-k8s.io"
prefix are transferred.This PR should help with implementing the termination handler for spot instances - #3528
Happy to hear what others think about this change.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3504