Remove NodeConditions from Machine.Status #1081

vincepri · 2019-06-26T19:44:55Z

Signed-off-by: Vince Prignano [email protected]

What this PR does / why we need it:
This PR removes Conditions fields from Machine.Status. Only applies to v1alpha2 types.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

k8s-ci-robot · 2019-06-26T19:45:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vincepri · 2019-06-26T19:52:42Z

/assign @detiber @ncdc

detiber · 2019-06-26T20:34:03Z

/lgtm
/hold

+1 from me considering the Node's Conditions and Addresses can be introspected by following the nodeRef.

holding to give time for additional feedback.

ncdc · 2019-06-26T20:36:20Z

+1. I know there was #950 (closed, unfixed) so we should probably discuss a solution for @jichenjc.

detiber · 2019-06-26T20:37:58Z

While #950 was closed/unresolved, The related PR stated that the getting the Addresses from the node was sufficient: #951 (comment)

rudoi · 2019-06-28T14:14:02Z

+1 from me!

akutz · 2019-06-28T15:39:30Z

Is this because of the presence of NodeRef? And if so, will this be back ported to 1.5? CAPV does currently use the Machine's addresses as part of the workflow to indicate whether or not a machine is ready. In fact, this is critical since on-prem providers may not have a ControlPlaneEndpoint from an external load balancer, and thus the initial control plane machine's IP address(es) are used to determine the control plane endpoint.

I'm adding a hold that anyone can cancel at any time. I just want to make sure that we understand the effects this may have on providers relying on those addresses. If we shouldn't be, then tell us why not, and what we should use instead. Thanks!

p.s. @andrewsykim I've an inkling the answer will be to use the NodeRef to determine addresses. If this is the case, then we need to reconcile the discrepancy between the CAPV address resolution and the CCM address resolution sooner rather than later.

/hold

ncdc · 2019-06-28T15:41:52Z

This will not be backported.

If you need to know when a machine is "ready", we have machine.status.infrastructureReady in v1alpha2. This field flips to true only when the infrastructure provider decides this is appropriate.

If you need to know the apiserver's url... we'll need to continue to think about that.

vincepri · 2019-06-28T16:02:48Z

/test pull-cluster-api-test

vincepri · 2019-06-28T16:08:30Z

/test pull-cluster-api-test

vincepri · 2019-06-28T16:14:14Z

/test pull-cluster-api-test

davidewatson · 2019-06-28T18:17:06Z

I am not sure about this change and would like feedback from others who participated in the debate which introduced it (e.g. @hardikdr, @alvaroaleman, et al.).

Accessing Node heath is harder when there is indirection. Even with the NodeRef controller, Use remote NodeRef in Machine and MachineSet controllers #1052, this is non-trivial for a kubectl user.
Historically we have said that the MachineSet controller should not depend on Nodes directly. Instead, it should only depend on Machines. This PR would change that assumption.

Notes

Conditions were added here: #483

At the time I remember there was a long discussion. Two reasons from the PR are #47 and #253 Looking at last years meeting notes on August 15th another reason was given: to avoid requiring the MachineSet controller to access remote Node resources. The general principle was that controllers should be able to be layered (i.e. MachineDeployment -> MachineSet -> Machine -> Node) and controllers at different layers should only depend on the layer immediately below.

andrewsykim · 2019-06-28T19:07:37Z

If the intent is to rely on Kubernetes Node addresses, there are some issues there worth considering:

There can be multiple addresses of the same type and deciding which address to use for a given machine may not be trivial. Most consumers of node addresses will choose the first one, but the ordering of addresses is also subject to change. There are multiple bug reports of Kubernetes clusters on AWS breaking because the kubelet started to report addresses from ENIs that were not meant to be consumed by the kubelet. (see Kubelet reports secondary InternalIP in AWS with multiple ENIs kubernetes/kubernetes#61921 for one case)
The update mechanism for node addresses by the kubelet has some issues, the biggest one being that if it sees more than 1 address of the same type it can unintentionally merge the two, so the initial address chosen for the machine may not be the same over time and in some cases not exist at all. (see Don't use strategic merge patch on Node.Status.Addresses kubernetes/kubernetes#79391 for more details)
The machine address is now highly coupled to the cloud provider implementation.

vincepri · 2019-06-28T19:10:37Z

Let's add this item to next week's agenda and discuss on Zoom, if you all agree.

One thing I'd like to point out is that both of these fields are unused in v1alpha1.

ncdc · 2019-06-28T19:20:40Z

Addresses may be used by providers (e.g. CAPV uses them).

akutz · 2019-06-28T19:25:09Z

Addresses may be used by providers (e.g. CAPV uses them).

Currently refactoring that out, but it just opens up a broader issue related to how CAPI providers handle addressing versus the cloud providers for the same platform (as @andrewsykim noted).

akutz · 2019-07-03T17:40:54Z

Hi @dhellmann,

If we don't place addresses on the Machine, providers that need the addresses (bootstrapping, post-boot config, rebooting, etc.)

CAPV maintains the addresses in both locations for this reason. Placing them solely on the NodeRef fails to address the issue of IP addresses that may not be relevant to Kubernetes, but relevant to machine infrastructure. This is an issue that I've discussed with @andrewsykim quite a bit.

Today there are only five choices for a Kubernetes NodeAddressType:

type NodeAddressType string

const (
    NodeHostName    NodeAddressType = "Hostname"
    NodeExternalIP  NodeAddressType = "ExternalIP"
    NodeInternalIP  NodeAddressType = "InternalIP"
    NodeExternalDNS NodeAddressType = "ExternalDNS"
    NodeInternalDNS NodeAddressType = "InternalDNS"
)

I think the node address type should be a bit mask:

type NodeAddressType uint32

const (
	NodeAddrUnknown NodeAddressType = 0
	NodeAddrDNS NodeAddressType = 1 << iota
	NodeAddrIP4
	NodeAddrIP6
	NodeAddrInternal
	NodeAddrExternal
	NodeAddrAPIServer
	NodeAddrKubelet
	// and others
)

This way it will be possible to easily ascertain the purpose of an address associated with a node, whether it's meant for the API server, the Kubelet, some non Kubernetes service, the address family, etc.

dlipovetsky · 2019-07-03T17:42:50Z

Please note, this PR is suggesting we remove the NodeConditions

Thanks for clarifying @ncdc. I overlooked that.

Since we would have to change the Conditions type from corev1.NodeConditions to something defined by CAPI itself, and that replacement type is not proposed yet, I am ok with removing Conditions in the interim, then adding Conditions back when needed.

moshloop · 2019-07-03T17:45:13Z

My concern would be around forcing a remote call for static or slow changing fields could introduce undefined or unwanted behavior when there are intermittent network issues, especially if these addresses and/or conditions are used for load balancers/endpoints.

michaelgugino · 2019-07-03T18:15:27Z

To chime in, we're using the addresses today in OpenShift.

Generally speaking, components external to cluster-api should be able to rely on machine-object as the source of truth. We might need these addresses for any number of things (today we use internal hostname for CSR approval), and having to account for a proliferation of infra-providers seems unhelpful long term.

detiber · 2019-07-03T18:48:11Z

I'm +1 for removing the existing Conditions as documented (mirroring Node Conditions).

I'm now a -1 on Removing the Addresses. I think there has been sufficient cases made where the existing Node Addresses may not be sufficient.

vincepri · 2019-07-03T18:54:58Z

Would having Addresses in the Infrastructure Provider status field and some helper methods in CAPI to quickly parse unstructured objects be a possible alternative?

The upside is that we don't have to copy the status and deal with 2 source of truth.

detiber · 2019-07-03T18:59:48Z

Would having Addresses in the Infrastructure Provider status field and some helper methods in CAPI to quickly parse unstructured objects be a possible alternative?

The upside is that we don't have to copy the status and deal with 2 source of truth.

I would expect these addresses to only be set on create, especially since we have said that a failed instance should not be replaced by a Machine controller.

In general, if we expect an outside system to interact with it, then it should be a field on a known resource and not have to deal with unstructured.

timothysc · 2019-07-03T19:05:14Z

+0 on Addresses

IMO I had thoughts of using conditions for the future states of the state machine.

detiber · 2019-07-03T19:07:09Z

IMO I had thoughts of using conditions for the future states of the state machine.

No objections from me here, but the comment would need updating and we are still lacking a backing implementation to populate the field.

vincepri · 2019-07-08T17:56:01Z

Hey folks, given the feedback, I went ahead and scoped down the PR to just remove NodeConditions and keep Addresses.

hardikdr · 2019-07-09T08:58:38Z

I am inclined towards keeping something-like Conditions in API. From UX PoV it helps in realizing the status of the machine quickly and probably could also be used for extension of machine-phases.

Although we never really implemented the logic to populate this field so far, we might want to consider having it in the future. I think it does not have to be NodeConditions, it could be something generic, what we could also define. For now, I am happy with the current changes, thanks for the PR.

detiber · 2019-07-10T20:45:55Z

@hardikdr fully agree longer term that having Machine Conditions would be beneficial, but I'd also like to get away from having fields for unimplemented functionality.

ncdc · 2019-07-10T20:51:48Z

And again, to clarify, we do still have some "conditions" in the machine status: bootstrapReady and infrastructureReady. These are actively used in v1alpha2.

vincepri · 2019-07-11T14:24:21Z

Given that it has been 2 weeks and comments have slowed down on this PR, should we merge this change?

Signed-off-by: Vince Prignano <[email protected]>

ncdc · 2019-07-11T14:26:04Z

I haven't seen any major disagreements in the comments for removing NodeConditions. Lazy consensus ftw.

/hold cancel
/lgtm

ncdc · 2019-07-11T14:26:13Z

Oh but you need to rebase 😄

vincepri · 2019-07-11T14:28:04Z

@ncdc Just rebased :)

detiber · 2019-07-11T14:30:38Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 26, 2019

k8s-ci-robot requested review from detiber and justinsb June 26, 2019 19:45

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 26, 2019

vincepri force-pushed the remove-unused-fields branch from 6fb6f2e to 5c2e21b Compare June 26, 2019 19:45

k8s-ci-robot assigned detiber and ncdc Jun 26, 2019

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 26, 2019

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2019

vincepri force-pushed the remove-unused-fields branch from 5c2e21b to eabb4a7 Compare June 28, 2019 15:57

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 28, 2019

vincepri mentioned this pull request Jul 5, 2019

Embed NodeRef logic into Machine controller #1128

Merged

vincepri changed the title ~~Remove NodeConditions and Addresses from Machine.Status~~ Remove NodeConditions from Machine.Status Jul 8, 2019

vincepri force-pushed the remove-unused-fields branch from eabb4a7 to 2e0bd1c Compare July 8, 2019 17:56

Remove NodeConditions from Machine.Status

dee2b39

Signed-off-by: Vince Prignano <[email protected]>

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 11, 2019

vincepri force-pushed the remove-unused-fields branch from 2e0bd1c to dee2b39 Compare July 11, 2019 14:27

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2019

k8s-ci-robot merged commit 68d2eba into kubernetes-sigs:master Jul 11, 2019

rthallisey mentioned this pull request Jul 17, 2019

Standarize conditions across component operators kubevirt/hyperconverged-cluster-operator#163

Merged

dhellmann mentioned this pull request Sep 9, 2019

REQUEST: New membership for dhellmann kubernetes/org#1171

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove NodeConditions from Machine.Status #1081

Remove NodeConditions from Machine.Status #1081

vincepri commented Jun 26, 2019 •

edited

Loading

k8s-ci-robot commented Jun 26, 2019

vincepri commented Jun 26, 2019

detiber commented Jun 26, 2019

ncdc commented Jun 26, 2019

detiber commented Jun 26, 2019

rudoi commented Jun 28, 2019

akutz commented Jun 28, 2019 •

edited

Loading

ncdc commented Jun 28, 2019

vincepri commented Jun 28, 2019

vincepri commented Jun 28, 2019

vincepri commented Jun 28, 2019

davidewatson commented Jun 28, 2019 •

edited

Loading

andrewsykim commented Jun 28, 2019 •

edited

Loading

vincepri commented Jun 28, 2019

ncdc commented Jun 28, 2019

akutz commented Jun 28, 2019

akutz commented Jul 3, 2019

dlipovetsky commented Jul 3, 2019

moshloop commented Jul 3, 2019 •

edited

Loading

michaelgugino commented Jul 3, 2019

detiber commented Jul 3, 2019

vincepri commented Jul 3, 2019

detiber commented Jul 3, 2019

timothysc commented Jul 3, 2019

detiber commented Jul 3, 2019

vincepri commented Jul 8, 2019 •

edited

Loading

hardikdr commented Jul 9, 2019

detiber commented Jul 10, 2019

ncdc commented Jul 10, 2019

vincepri commented Jul 11, 2019

ncdc commented Jul 11, 2019

ncdc commented Jul 11, 2019

vincepri commented Jul 11, 2019

detiber commented Jul 11, 2019

Remove NodeConditions from Machine.Status #1081

Remove NodeConditions from Machine.Status #1081

Conversation

vincepri commented Jun 26, 2019 • edited Loading

k8s-ci-robot commented Jun 26, 2019

vincepri commented Jun 26, 2019

detiber commented Jun 26, 2019

ncdc commented Jun 26, 2019

detiber commented Jun 26, 2019

rudoi commented Jun 28, 2019

akutz commented Jun 28, 2019 • edited Loading

ncdc commented Jun 28, 2019

vincepri commented Jun 28, 2019

vincepri commented Jun 28, 2019

vincepri commented Jun 28, 2019

davidewatson commented Jun 28, 2019 • edited Loading

Notes

andrewsykim commented Jun 28, 2019 • edited Loading

vincepri commented Jun 28, 2019

ncdc commented Jun 28, 2019

akutz commented Jun 28, 2019

akutz commented Jul 3, 2019

dlipovetsky commented Jul 3, 2019

moshloop commented Jul 3, 2019 • edited Loading

michaelgugino commented Jul 3, 2019

detiber commented Jul 3, 2019

vincepri commented Jul 3, 2019

detiber commented Jul 3, 2019

timothysc commented Jul 3, 2019

detiber commented Jul 3, 2019

vincepri commented Jul 8, 2019 • edited Loading

hardikdr commented Jul 9, 2019

detiber commented Jul 10, 2019

ncdc commented Jul 10, 2019

vincepri commented Jul 11, 2019

ncdc commented Jul 11, 2019

ncdc commented Jul 11, 2019

vincepri commented Jul 11, 2019

detiber commented Jul 11, 2019

vincepri commented Jun 26, 2019 •

edited

Loading

akutz commented Jun 28, 2019 •

edited

Loading

davidewatson commented Jun 28, 2019 •

edited

Loading

andrewsykim commented Jun 28, 2019 •

edited

Loading

moshloop commented Jul 3, 2019 •

edited

Loading

vincepri commented Jul 8, 2019 •

edited

Loading