Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove NodeConditions from Machine.Status #1081

Merged
merged 1 commit into from
Jul 11, 2019

Conversation

vincepri
Copy link
Member

@vincepri vincepri commented Jun 26, 2019

Signed-off-by: Vince Prignano [email protected]

What this PR does / why we need it:
This PR removes Conditions fields from Machine.Status. Only applies to v1alpha2 types.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 26, 2019
@k8s-ci-robot k8s-ci-robot requested review from detiber and justinsb June 26, 2019 19:45
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 26, 2019
@vincepri vincepri force-pushed the remove-unused-fields branch from 6fb6f2e to 5c2e21b Compare June 26, 2019 19:45
@vincepri
Copy link
Member Author

/assign @detiber @ncdc

@detiber
Copy link
Member

detiber commented Jun 26, 2019

/lgtm
/hold

+1 from me considering the Node's Conditions and Addresses can be introspected by following the nodeRef.

holding to give time for additional feedback.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 26, 2019
@ncdc
Copy link
Contributor

ncdc commented Jun 26, 2019

+1. I know there was #950 (closed, unfixed) so we should probably discuss a solution for @jichenjc.

@detiber
Copy link
Member

detiber commented Jun 26, 2019

While #950 was closed/unresolved, The related PR stated that the getting the Addresses from the node was sufficient: #951 (comment)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2019
@rudoi
Copy link
Contributor

rudoi commented Jun 28, 2019

+1 from me!

@akutz
Copy link
Contributor

akutz commented Jun 28, 2019

Is this because of the presence of NodeRef? And if so, will this be back ported to 1.5? CAPV does currently use the Machine's addresses as part of the workflow to indicate whether or not a machine is ready. In fact, this is critical since on-prem providers may not have a ControlPlaneEndpoint from an external load balancer, and thus the initial control plane machine's IP address(es) are used to determine the control plane endpoint.

I'm adding a hold that anyone can cancel at any time. I just want to make sure that we understand the effects this may have on providers relying on those addresses. If we shouldn't be, then tell us why not, and what we should use instead. Thanks!

p.s. @andrewsykim I've an inkling the answer will be to use the NodeRef to determine addresses. If this is the case, then we need to reconcile the discrepancy between the CAPV address resolution and the CCM address resolution sooner rather than later.

/hold

@ncdc
Copy link
Contributor

ncdc commented Jun 28, 2019

This will not be backported.

If you need to know when a machine is "ready", we have machine.status.infrastructureReady in v1alpha2. This field flips to true only when the infrastructure provider decides this is appropriate.

If you need to know the apiserver's url... we'll need to continue to think about that.

@vincepri vincepri force-pushed the remove-unused-fields branch from 5c2e21b to eabb4a7 Compare June 28, 2019 15:57
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 28, 2019
@vincepri
Copy link
Member Author

/test pull-cluster-api-test

2 similar comments
@vincepri
Copy link
Member Author

/test pull-cluster-api-test

@vincepri
Copy link
Member Author

/test pull-cluster-api-test

@davidewatson
Copy link
Contributor

davidewatson commented Jun 28, 2019

I am not sure about this change and would like feedback from others who participated in the debate which introduced it (e.g. @hardikdr, @alvaroaleman, et al.).

  1. Accessing Node heath is harder when there is indirection. Even with the NodeRef controller, Use remote NodeRef in Machine and MachineSet controllers #1052, this is non-trivial for a kubectl user.
  2. Historically we have said that the MachineSet controller should not depend on Nodes directly. Instead, it should only depend on Machines. This PR would change that assumption.

Notes

Conditions were added here: #483

At the time I remember there was a long discussion. Two reasons from the PR are #47 and #253 Looking at last years meeting notes on August 15th another reason was given: to avoid requiring the MachineSet controller to access remote Node resources. The general principle was that controllers should be able to be layered (i.e. MachineDeployment -> MachineSet -> Machine -> Node) and controllers at different layers should only depend on the layer immediately below.

@andrewsykim
Copy link
Member

andrewsykim commented Jun 28, 2019

If the intent is to rely on Kubernetes Node addresses, there are some issues there worth considering:

  • There can be multiple addresses of the same type and deciding which address to use for a given machine may not be trivial. Most consumers of node addresses will choose the first one, but the ordering of addresses is also subject to change. There are multiple bug reports of Kubernetes clusters on AWS breaking because the kubelet started to report addresses from ENIs that were not meant to be consumed by the kubelet. (see Kubelet reports secondary InternalIP in AWS with multiple ENIs kubernetes/kubernetes#61921 for one case)
  • The update mechanism for node addresses by the kubelet has some issues, the biggest one being that if it sees more than 1 address of the same type it can unintentionally merge the two, so the initial address chosen for the machine may not be the same over time and in some cases not exist at all. (see Don't use strategic merge patch on Node.Status.Addresses kubernetes/kubernetes#79391 for more details)
  • The machine address is now highly coupled to the cloud provider implementation.

@vincepri
Copy link
Member Author

Let's add this item to next week's agenda and discuss on Zoom, if you all agree.

One thing I'd like to point out is that both of these fields are unused in v1alpha1.

@ncdc
Copy link
Contributor

ncdc commented Jun 28, 2019

Addresses may be used by providers (e.g. CAPV uses them).

@akutz
Copy link
Contributor

akutz commented Jun 28, 2019

Addresses may be used by providers (e.g. CAPV uses them).

Currently refactoring that out, but it just opens up a broader issue related to how CAPI providers handle addressing versus the cloud providers for the same platform (as @andrewsykim noted).

@akutz
Copy link
Contributor

akutz commented Jul 3, 2019

Hi @dhellmann,

If we don't place addresses on the Machine, providers that need the addresses (bootstrapping, post-boot config, rebooting, etc.)

CAPV maintains the addresses in both locations for this reason. Placing them solely on the NodeRef fails to address the issue of IP addresses that may not be relevant to Kubernetes, but relevant to machine infrastructure. This is an issue that I've discussed with @andrewsykim quite a bit.

Today there are only five choices for a Kubernetes NodeAddressType:

type NodeAddressType string

const (
    NodeHostName    NodeAddressType = "Hostname"
    NodeExternalIP  NodeAddressType = "ExternalIP"
    NodeInternalIP  NodeAddressType = "InternalIP"
    NodeExternalDNS NodeAddressType = "ExternalDNS"
    NodeInternalDNS NodeAddressType = "InternalDNS"
)

I think the node address type should be a bit mask:

type NodeAddressType uint32

const (
	NodeAddrUnknown NodeAddressType = 0
	NodeAddrDNS NodeAddressType = 1 << iota
	NodeAddrIP4
	NodeAddrIP6
	NodeAddrInternal
	NodeAddrExternal
	NodeAddrAPIServer
	NodeAddrKubelet
	// and others
)

This way it will be possible to easily ascertain the purpose of an address associated with a node, whether it's meant for the API server, the Kubelet, some non Kubernetes service, the address family, etc.

@dlipovetsky
Copy link
Contributor

Please note, this PR is suggesting we remove the NodeConditions

Thanks for clarifying @ncdc. I overlooked that.

Since we would have to change the Conditions type from corev1.NodeConditions to something defined by CAPI itself, and that replacement type is not proposed yet, I am ok with removing Conditions in the interim, then adding Conditions back when needed.

@moshloop
Copy link
Contributor

moshloop commented Jul 3, 2019

My concern would be around forcing a remote call for static or slow changing fields could introduce undefined or unwanted behavior when there are intermittent network issues, especially if these addresses and/or conditions are used for load balancers/endpoints.

@michaelgugino
Copy link
Contributor

To chime in, we're using the addresses today in OpenShift.

Generally speaking, components external to cluster-api should be able to rely on machine-object as the source of truth. We might need these addresses for any number of things (today we use internal hostname for CSR approval), and having to account for a proliferation of infra-providers seems unhelpful long term.

@detiber
Copy link
Member

detiber commented Jul 3, 2019

I'm +1 for removing the existing Conditions as documented (mirroring Node Conditions).

I'm now a -1 on Removing the Addresses. I think there has been sufficient cases made where the existing Node Addresses may not be sufficient.

@vincepri
Copy link
Member Author

vincepri commented Jul 3, 2019

Would having Addresses in the Infrastructure Provider status field and some helper methods in CAPI to quickly parse unstructured objects be a possible alternative?

The upside is that we don't have to copy the status and deal with 2 source of truth.

@detiber
Copy link
Member

detiber commented Jul 3, 2019

Would having Addresses in the Infrastructure Provider status field and some helper methods in CAPI to quickly parse unstructured objects be a possible alternative?

The upside is that we don't have to copy the status and deal with 2 source of truth.

I would expect these addresses to only be set on create, especially since we have said that a failed instance should not be replaced by a Machine controller.

In general, if we expect an outside system to interact with it, then it should be a field on a known resource and not have to deal with unstructured.

@timothysc
Copy link
Member

+0 on Addresses

IMO I had thoughts of using conditions for the future states of the state machine.

@detiber
Copy link
Member

detiber commented Jul 3, 2019

IMO I had thoughts of using conditions for the future states of the state machine.

No objections from me here, but the comment would need updating and we are still lacking a backing implementation to populate the field.

@vincepri vincepri changed the title Remove NodeConditions and Addresses from Machine.Status Remove NodeConditions from Machine.Status Jul 8, 2019
@vincepri
Copy link
Member Author

vincepri commented Jul 8, 2019

Hey folks, given the feedback, I went ahead and scoped down the PR to just remove NodeConditions and keep Addresses.

@vincepri vincepri force-pushed the remove-unused-fields branch from eabb4a7 to 2e0bd1c Compare July 8, 2019 17:56
@hardikdr
Copy link
Member

hardikdr commented Jul 9, 2019

I am inclined towards keeping something-like Conditions in API. From UX PoV it helps in realizing the status of the machine quickly and probably could also be used for extension of machine-phases.

Although we never really implemented the logic to populate this field so far, we might want to consider having it in the future. I think it does not have to be NodeConditions, it could be something generic, what we could also define. For now, I am happy with the current changes, thanks for the PR.

@detiber
Copy link
Member

detiber commented Jul 10, 2019

@hardikdr fully agree longer term that having Machine Conditions would be beneficial, but I'd also like to get away from having fields for unimplemented functionality.

@ncdc
Copy link
Contributor

ncdc commented Jul 10, 2019

And again, to clarify, we do still have some "conditions" in the machine status: bootstrapReady and infrastructureReady. These are actively used in v1alpha2.

@vincepri
Copy link
Member Author

Given that it has been 2 weeks and comments have slowed down on this PR, should we merge this change?

@ncdc
Copy link
Contributor

ncdc commented Jul 11, 2019

I haven't seen any major disagreements in the comments for removing NodeConditions. Lazy consensus ftw.

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 11, 2019
@ncdc
Copy link
Contributor

ncdc commented Jul 11, 2019

Oh but you need to rebase 😄

@vincepri vincepri force-pushed the remove-unused-fields branch from 2e0bd1c to dee2b39 Compare July 11, 2019 14:27
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2019
@vincepri
Copy link
Member Author

@ncdc Just rebased :)

@detiber
Copy link
Member

detiber commented Jul 11, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit 68d2eba into kubernetes-sigs:master Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.