Extend MachineStatus to add ProviderID #565

vikaschoudhary16 · 2018-10-28T00:42:11Z

A new field, ProviderID, is being proposed to add in machine object Spec. This field is required by higher level consumers of cluster-api such as autoscaler for the machines v/s nodes reconciliation logic. With cluster-api as a generic out-of-tree provider for autoscaler, this field is required by autoscaler to be able to have a provider view of list of machines.

This field will be set by actuators and consumed by higher level entities who will be interfacing with cluster-api as generic provider.

Add ProviderID to the machine Spec

/cc @roberthbailey @hardikdr @enxebre @derekwaynecarr

k8s-ci-robot · 2018-10-28T00:42:12Z

@vikaschoudhary16: GitHub didn't allow me to request PR reviews from the following users: hardikdr, enxebre.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

A new field, ProviderID, is being proposed to add in MachineStatus. This field is required by higher level consumers of cluster-api such as autoscaler for the machines v/s nodes reconciliation logic. With cluster-api as a generic out-of-tree provider for autoscaler, this field is required by autoscaler to be able to have a provider view of list of machines.
Add ProviderID to the machine status
/cc @roberthbailey @hardikdr @enxebre @derekwaynecarr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

roberthbailey · 2018-10-31T18:00:02Z

pkg/apis/cluster/v1alpha1/machine_types.go

+
+	// ProviderID is the identification ID of the machine provided by the provider
+	// +optional
+	ProviderID *string `json:"providerID,omitempty"`


Please update this comment to make it clear that this field must match the provider ID as seen on the node object corresponding to this machine.

alvaroaleman · 2018-11-01T08:14:33Z

My 2C to this:

I am in favor of adding this but for a different reason: To allow the machine-controller/actuator to store an ID to identify the instance at the cloud provider in case identifying information on the instance gets lost (rename, someone deletes the tag etc)
I am opposed to merging anything because of cluster-autoscaler until the proposal got merged, so if ppl think the point above does not apply, IMHO we should not merge this

hardikdr

I agree with @alvaroaleman .
I support adding ProviderIDfield in the MachineStatus but only for the reason mentioned above.
I can recall autoscaler-integration requires it, probably for mapping the node object with machine-object, and in turn machine-deployment object. But this should rather settle at the KEP opened for the same.

sidharthsurana · 2018-11-08T06:06:22Z

As, I mentioned during today's sig meeting as well, keeping the ProviderID in the MachineStatus will make this field non-portable. That is, this information will be lost if ever a pivot would need to be done. As during the pivot the Status field of an object is not transferred to the target destination. So, if at all we need this info, we should look at adding that as a field in the Spec part of the Machine object definition.

vikaschoudhary16 · 2018-11-16T12:44:24Z

@sidharthsurana
Since providerID is an "observed" attribute, i am not sure if it make sense to make it part of spec.
Regarding pivot point, can you please share some reference so that i can understand what does actually pivot mean here? I could not understand what you meant by:

That is, this information will be lost if ever a pivot would need to be done.

roberthbailey · 2018-11-16T18:29:21Z

@vikaschoudhary16 - Is it always observed? Or does that depend on the provider? I think the question comes down to whether we think that it should be possible to use the provider id to match a machine with an existing infrastructure resource via this field (which is how @sidharthsurana is using it) by saying that through some means, the caller knows a priori what it should be set to.

The precedent here is the service IP (for k8s services) where you can ask for a specific one by setting the value in the spec, but if you leave it blank the service controller chooses one and fills it in for you. It may not be the best precedent to follow, but it seems to work pretty well in that case.

vikaschoudhary16 · 2018-11-19T04:09:37Z

@roberthbailey thanks!! sounds logical to me.

hardikdr · 2018-11-22T05:03:02Z

Is there any blocker yet for merging this PR?
@vikaschoudhary16 @sidharthsurana

vikaschoudhary16 · 2018-11-22T05:28:15Z

@hardikdr none to my knowledge, thanks!!!

sidharthsurana · 2018-11-22T07:27:31Z

@vikaschoudhary16 @hardikdr The point i was trying to make is as follows:

Say a user is using the clusterctl to deploy the cluster. (Note clusterctl is just an example, as you could do the same using simply kubectl). The general workflow would be that the cluster-api controllers and actuators would be running first in the bootstrap cluster where the initial Cluster and Machine objects will be created. The actuators comes into action and provisions the master machine. As per your description of this PR the actuator would then update the Status of the Machine object with the ProviderID value. From the discussion above seems like one of the use of this ProviderID field would be to match the machine to the node. Now, before we get to the machine to node matching, the next step would be to perform the pivot of the cluster-api controllers and objects to the target cluster just created. The process for this is pretty simple, the cluster-api controllers are deployed on the target cluster, followed by copying the objects from the bootstrap cluster to the target cluster. Now this is where the catch is. If you add the ProviderID property on the Status part of the Machine object, then when you copy over the Machine object for the master to the target cluster, now on the target cluster the Machine object will NOT have anything in the Status field as it is ignored when creating the object on the target cluster.
Thus, in this case if the idea of the ProviderID in the Status was to help match the Node object with the Machine that would not happen as the field is not missing from the Status part.
Thus my suggestion as to make this part of the Spec and not the Status that way even when you copy over the definition of the Machine object that information will persist.

I hope the above explanation helps.

alvaroaleman · 2018-11-22T07:36:08Z

@sidharthsurana But wouldn't then the better solution be to copy over the Status as well when pivoting?

hardikdr · 2018-11-22T08:00:07Z

I would second with the idea of also migrating the Status while pivoting.
Although, looking at core Node-object, ProviderID field seems to be falling under Spec.

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.12/#node-v1-core
Could the motivation be that it is a one-time observation and should not be updated later, which makes it appropriate for Spec? - not very sure.

@alvaroaleman @sidharthsurana thoughts ? We could conclude on Spec for now and move to Status incase we see a need? or vice versa ?

@vikaschoudhary16 seems to have already migrated the ProviderID to Spec. Based on the conclusion here maybe you want to update the commit description which refers to the MachineStatus atm.

sidharthsurana · 2018-11-22T08:00:46Z

@sidharthsurana But wouldn't then the better solution be to copy over the Status as well when pivoting?

Yes, but if you are using kubectl to create the object, the Status even if present in the object definition, it will be ignored. It's more of a k8s client behavior.

sidharthsurana · 2018-11-22T08:09:42Z

@alvaroaleman @sidharthsurana thoughts ? We could conclude on Spec for now and move to Status incase we see a need? or vice versa ?

Here is my take, if the ProviderID is a field that can be re-populated even if it is lost, then Status is okay since, even after pivot the controller/actuator on the target cluster would re-populate it properly. But if we are depending on this property for the controller/actuator to work then this needs to be in the Spec section.

sidharthsurana · 2018-11-22T08:21:01Z

In general I am okay with adding this field, my only concern is about where we add it. Since I am not a 100% sure of the underlying use cases thus the questions.

alvaroaleman · 2018-11-22T08:23:59Z

Could the motivation be that it is a one-time observation and should not be updated later, which makes it appropriate for Spec? - not very sure.

Yes, I think so.

Yes, but if you are using kubectl to create the object, the Status even if present in the object definition, it will be ignored. It's more of a k8s client behavior.

Okay but if that is the only reason to move that field to Spec we should probably instead try to figure out how to make the client set the status. Maybe the issue here is that we have a Status subresource so we need to use the Status client?

Here is my take, if the ProviderID is a field that can be re-populated even if it is lost, then Status is okay since, even after pivot the controller/actuator on the target cluster would re-populate it properly. But if we are depending on this property for the controller/actuator to work then this needs to be in the Spec section.

Hm. I would view it more as: If this is immutable and will never be changed once populated, it belongs into Spec. If there is a chance this gets changed during the lifetime of a machine, it should be part of the Status. In the Kubermatic machine-controller we do recreate cloud provider instances if someone deleted them manually (e.G. by accident). This in turn means that the ID of those instances changes. If we put the ProviderID into the Spec we basically can not support this scenario because its not safe for the controller to update that field.

enxebre · 2018-11-22T08:59:51Z

If you add the ProviderID property on the Status part of the Machine object, then when you copy over the Machine object for the master to the target cluster, now on the target cluster the Machine object will NOT have anything in the Status field as it is ignored when creating the object on the target cluster.

Make sense, I see this a consequence of an arbitrary process (pivoting) followed in some cases for coming up with a cluster, so I think the tooling driving it should be responsible for ensuring its expectations. But I think we should not choose object spec/object status based on it.

Could the motivation be that it is a one-time observation and should not be updated later, which makes it appropriate for Spec? - not very sure.

The difference I can see with the node object being in the spec is that a machine could be backing a different node during time. E.g if the external state is changed i.e an instance is manually deleted, eventually the machine controller should try to reconcile desired state and create a new instance based on the existing machine object.

vikaschoudhary16 · 2018-11-22T13:10:11Z

Now i think it makes more sense to add it in Status. Thanks a lot guys for this great discussion so far. I want to summarize why i now think it makes more sense to keep it at Status:

It is possible that a machine object starts pointing different instance, for ex original instance gets deleted/crashed etc. Therefore ProviderID is NOT immutable for a machine object and thus keeping in Status makes sense.
In Node object, ProviderID is immutable and thus makes sense to see it in Spec.
Regarding pivoting to target cluster, agree with @alvaroaleman @enxebre that it should be handled at client/tooling and it should not influence where to keep ProviderID in machine object.

I am feeling to revert it back to initial change i.e ProviderID in Status.

@sidharthsurana @alvaroaleman @enxebre @hardikdr @roberthbailey thoughts?

sidharthsurana · 2018-11-22T19:23:53Z

3. Regarding pivoting to target cluster, agree with @alvaroaleman @enxebre that it should be handled at client/tooling and it should not influence where to keep ProviderID in machine object

I agree that the tooling should handle the case of making the status and spec in sync. However, keep in mind, even if the tooling tries to handle that, it would not be an atomic operation rather it would be a 2 step operation. That means, that the controller/actuator running on the target cluster will get notified of the changes to the object twice and the actuator then needs to handle this case gracefully.

Hm. I would view it more as: If this is immutable and will never be changed once populated, it belongs into Spec. If there is a chance this gets changed during the lifetime of a machine, it should be part of the Status. In the Kubermatic machine-controller we do recreate cloud provider instances if someone deleted them manually (e.G. by accident). This in turn means that the ID of those instances changes. If we put the ProviderID into the Spec we basically can not support this scenario because its not safe for the controller to update that field.

Although I get the motivation of making the spec immutable, but IMHO that is kind of subjective. I like that idea of spec being immutable too, but in general there may be cases which may demand that the spec might be treated as mutable as well. For instance, right now if you want to upgrade the kubelet version, what do one do? well we update the spec to the new kubelet version and the actuator needs to handle that situation. Another case could be of vertical scaling. What if I just want to increase say the cpu/memory of a worker. One way is create a brand new machine with the new size and delete the old one. However, a better approach might be that if the underlying provider supports hot-add of CPU/Memory then simply reconfigure the instance. Thus for this later to happen, one might have to update the spec to reflect the new intent.
I think the immutability/mutability of the spec is not as simple as black and white case. Considering CRDs doesn't provide us any strict enforcement of any immutability.

Regarding the specific scenario @alvaroaleman you are referring to, in general is the simple scenario of given a machine object how does the actuator figure out that the underlying actual instance exist. In general there are 3 options to resolve this.

decorate/enhance the Machine object with some sort of identifier from the underlying provider. So that the actuator to precisely query for the object and resolve the question.
decorate the backend instance on the provider with some metadata from the Machine object. So that the actuator could query the underlying provider filtering by that metadata coming from the Machine object.
Simply a mixture of both 1 and 2 above.

Here our case is scenario 1, so where do we put that metadata inside the Machine object, does it matter? For the case where someone accidentally deletes the backend instance, then the next reconciliation loop when the actuator tries to look up the backend instance by that ID it would not be able to find that, and that is a clear indication that it needs to re-create it + once it has created update the ID back in the same location (regardless of where it is Spec/Status) with 100% confidence.

roberthbailey · 2018-11-28T19:57:14Z

We discussed this quite a bit during the meeting today. Some takeaways:

@justinsb said "The way spec and status was explained to me is that if it can be recovered then it goes into status and if not then it goes into spec" which would argue for .... maybe the spec, but it depends on the environment
@timothysc suggested that until we have a client other than the autoscaler that needs to know this on a consistent basis across multiple providers, we should consider using an annotation instead of making a first class field
@hardikdr reiterated that having a strong and consistent link between the machine, node, and underlying infrastructure machine (physical or virtual) was still worth it and that we should structure the provider ID in the same way that the cloud providers in k8s do
Nate had a suggestion that if we want to keep the field in status (because it's observed) we can use an annotation as a mechanism to pass the a priori value to the controller during the pivot.

frobware · 2018-11-30T15:09:42Z

@timothysc suggested that until we have a client other than the autoscaler that needs to know this on a consistent basis across multiple providers, we should consider using an annotation instead of making a first class field

This is the route I chose for the autoscaler / cluster-api implementation I have been working on here: openshift/kubernetes-autoscaler#14

vikaschoudhary16 · 2018-12-05T15:48:41Z

We can almost do most of the things with annotations instead using concrete apis. In my opinion, annotations should be used for experimentation where we are not sure if this api will ever be needed or not. Here i think most of us are agree that this information is required, therefore making it api makes more sense to me. APIs provides greater confidence to build something upon as compared to annotations. Different implementations, different annotation name, more confusion.

hardikdr · 2019-01-19T13:04:39Z

Ping.
Let's please discuss this topic in next meeting and try to settle?
@vikaschoudhary16 @alvaroaleman @sidharthsurana @roberthbailey

dlipovetsky · 2019-01-23T18:29:03Z

I think ProviderID field in the Machine object is similar to the Node field in the Pod object. The Node field is part of the Pod spec, and is generally set by the scheduler, but can be set by the user (e.g. to pin a Pod to some Node). The ProviderID field will also generally be set by a controller, but it may be set by the user. Therefore I think the ProviderID can be in the Spec.

roberthbailey · 2019-01-23T20:26:32Z

As alluded to by @dlipovetsky above, we discussed this during the meeting today. We agreed that we wanted the field and that it should go into the spec.

We also agreed that when making this sort of change we need to be updating our documentation (both inline and in the gitbook). @hardikdr volunteered to pick this change up if you aren't able to make those changes are part of getting it merged.

vikaschoudhary16 · 2019-01-25T11:46:36Z

Hey guys thanks a lot for discussing this. I could not attend cluster-api meeting because of TZ problem. I am ready to take up whatever it takes to get this merged.
Will coordinate with @hardikdr

roberthbailey · 2019-01-25T19:36:12Z

Thanks!

vincepri

Minor comment

vincepri · 2019-01-25T19:38:42Z

pkg/apis/cluster/v1alpha1/machine_types.go

+
+	// ProviderID is the identification ID of the machine provided by the provider.
+	// This field must match the provider ID as seen on the node object corresponding to this machine.
+	// +optional


Could we add some use cases and a more detailed description for this field? We should also make sure to communicate to users when the field should be set and when it could be consumed.

vikaschoudhary16 · 2019-01-30T18:14:27Z

/retest

vincepri · 2019-01-30T18:32:17Z

@vikaschoudhary16 Can you rebase on master?

/approve
/assign @detiber @roberthbailey

k8s-ci-robot · 2019-01-30T18:32:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vikaschoudhary16, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vikaschoudhary16 · 2019-01-30T18:49:24Z

@vincepri rebased!

detiber

Minor nit on removal of a newline, otherwise lgtm

detiber · 2019-01-30T18:52:29Z

pkg/apis/cluster/v1alpha1/machine_types.go

@@ -153,7 +166,6 @@ type MachineStatus struct {
 	// serialized/deserialized from this field.
 	// +optional
 	ProviderStatus *runtime.RawExtension `json:"providerStatus,omitempty"`
-


Since ProviderStatus and Addresses are not directly related it would be nice to keep this newline here to more easily differentiate.

Done, thanks!

detiber · 2019-01-30T19:46:45Z

/lgtm

k8s-ci-robot requested a review from roberthbailey October 28, 2018 00:42

k8s-ci-robot requested a review from derekwaynecarr October 28, 2018 00:42

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 28, 2018

vikaschoudhary16 mentioned this pull request Oct 28, 2018

Add providerID annotation to machine object openshift/cluster-api-provider-aws#86

Closed

roberthbailey reviewed Oct 31, 2018

View reviewed changes

hardikdr approved these changes Nov 4, 2018

View reviewed changes

vikaschoudhary16 force-pushed the add-provider-id branch from 7614808 to 64d4639 Compare November 16, 2018 12:34

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 16, 2018

vikaschoudhary16 force-pushed the add-provider-id branch 2 times, most recently from 9ffb881 to 4b38736 Compare November 19, 2018 04:53

frobware mentioned this pull request Nov 30, 2018

UPSTREAM: <carry>: cluster-api: use node.Spec.ProviderID for identifying nodegroups openshift/kubernetes-autoscaler#14

Merged

maisem mentioned this pull request Dec 18, 2018

dynamic link and reconciling between machines/nodes #497

Closed

vincepri reviewed Jan 25, 2019

View reviewed changes

vikaschoudhary16 force-pushed the add-provider-id branch from 4b38736 to f85676d Compare January 30, 2019 17:59

k8s-ci-robot assigned detiber and roberthbailey Jan 30, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2019

vikaschoudhary16 force-pushed the add-provider-id branch from f85676d to b06a950 Compare January 30, 2019 18:48

detiber reviewed Jan 30, 2019

View reviewed changes

vikaschoudhary16 force-pushed the add-provider-id branch from b06a950 to b9ba5d0 Compare January 30, 2019 19:32

Extend MachineStatus to add ProviderID

dd2d3e5

vikaschoudhary16 force-pushed the add-provider-id branch from b9ba5d0 to dd2d3e5 Compare January 30, 2019 19:38

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2019

k8s-ci-robot merged commit 3e0d971 into kubernetes-sigs:master Jan 30, 2019

vincepri mentioned this pull request Feb 5, 2019

Stronger link between Machine* <-> Cluster #728

Merged

enxebre mentioned this pull request Feb 5, 2020

REQUEST: New membership for enxebre kubernetes/org#1614

Closed

6 tasks

enxebre mentioned this pull request Apr 10, 2023

Allow provider id regex matching to support more provider formats #8485

Closed

Extend MachineStatus to add ProviderID #565

Extend MachineStatus to add ProviderID #565

Conversation

vikaschoudhary16 commented Oct 28, 2018 • edited Loading

k8s-ci-robot commented Oct 28, 2018

roberthbailey Oct 31, 2018

Choose a reason for hiding this comment

alvaroaleman commented Nov 1, 2018

hardikdr left a comment

Choose a reason for hiding this comment

sidharthsurana commented Nov 8, 2018

vikaschoudhary16 commented Nov 16, 2018

roberthbailey commented Nov 16, 2018

vikaschoudhary16 commented Nov 19, 2018

hardikdr commented Nov 22, 2018

vikaschoudhary16 commented Nov 22, 2018

sidharthsurana commented Nov 22, 2018

alvaroaleman commented Nov 22, 2018

hardikdr commented Nov 22, 2018

sidharthsurana commented Nov 22, 2018

sidharthsurana commented Nov 22, 2018

sidharthsurana commented Nov 22, 2018

alvaroaleman commented Nov 22, 2018

enxebre commented Nov 22, 2018 • edited Loading

vikaschoudhary16 commented Nov 22, 2018

sidharthsurana commented Nov 22, 2018

roberthbailey commented Nov 28, 2018

frobware commented Nov 30, 2018

vikaschoudhary16 commented Dec 5, 2018 • edited Loading

hardikdr commented Jan 19, 2019

dlipovetsky commented Jan 23, 2019 • edited Loading

roberthbailey commented Jan 23, 2019

vikaschoudhary16 commented Jan 25, 2019 • edited Loading

roberthbailey commented Jan 25, 2019

vincepri left a comment

Choose a reason for hiding this comment

vincepri Jan 25, 2019

Choose a reason for hiding this comment

vikaschoudhary16 Jan 30, 2019

Choose a reason for hiding this comment

vikaschoudhary16 commented Jan 30, 2019

vincepri commented Jan 30, 2019

k8s-ci-robot commented Jan 30, 2019

vikaschoudhary16 commented Jan 30, 2019

detiber left a comment

Choose a reason for hiding this comment

detiber Jan 30, 2019

Choose a reason for hiding this comment

vikaschoudhary16 Jan 30, 2019

Choose a reason for hiding this comment

detiber commented Jan 30, 2019

vikaschoudhary16 commented Oct 28, 2018 •

edited

Loading

enxebre commented Nov 22, 2018 •

edited

Loading

vikaschoudhary16 commented Dec 5, 2018 •

edited

Loading

dlipovetsky commented Jan 23, 2019 •

edited

Loading

vikaschoudhary16 commented Jan 25, 2019 •

edited

Loading