Create KEP for instance host labelling #997

misterikkit · 2019-04-24T23:13:40Z

This KEP proposes a new well-defined label for VM instance host as a
topology type. In this commit I have only written the summary,
motivation, and alternatives section so that we can get agreement on the
overall approach before discussing design.

/sig cloud-provider
/sig scheduling

/ref kubernetes/kubernetes#75274

k8s-ci-robot · 2019-04-24T23:31:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: misterikkit
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: hogepodge

If they are not already assigned, you can assign the PR to them by writing /assign @hogepodge in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

keps/sig-cloud-provider/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bsalamat · 2019-04-24T23:43:31Z

keps/sig-cloud-provider/20190423-instance-host-labelling.md

+
+<!-- tocstop -->
+
+## Release Signoff Checklist


I don't think you need to keep this section in the doc.

bsalamat · 2019-04-25T00:39:13Z

keps/sig-cloud-provider/20190423-instance-host-labelling.md

+  host
+- Any decision-making that is based on the instance host label
+
+## Proposal


looks like the rest of this KEP is not written yet.

it's better to only have context of this kep :)

Yep, per the KEP template, merging in this state means agreement on the goals and summary, with design details to follow.

bsalamat · 2019-04-25T00:41:50Z

I should add that the proposed approach is better than the alternative IMO.

andrewsykim

Would the "instance host" topology be optional for a cloud provider? If not, what is the expected outcome for bare metal deployment or a cloud provider that does not provider "instance host" information via their APIs?

andrewsykim · 2019-04-25T03:35:58Z

keps/sig-cloud-provider/20190423-instance-host-labelling.md

+### Goals
+
+- Introduce a well known label for instance host
+- Introduce a controller that keeps these labels updated


Would this warrant a new controller or can we add this functionality to the existing node controller that applies labels for zones and other existing well known labels?

I think we need a separate component (admission webhook, etc) for this feature. The component will be different for various platforms and cloud providers.

This should be part of cloud-controller-manager, as that is the official interface where we get provider-specific info, including zone and region. As far as I can tell, this does not handle the case of a VM migrating to a different failure domain, but that seems like something that should be fixed in cloud-controller-manager.

bsalamat · 2019-04-25T23:39:52Z

Would the "instance host" topology be optional for a cloud provider? If not, what is the expected outcome for bare metal deployment or a cloud provider that does not provider "instance host" information via their APIs?

Good question. Yes, the topology label will be optional and the scheduler works as before (spreads among nodes) if the physical host label does not exist.

misterikkit · 2019-04-26T23:31:08Z

Would the "instance host" topology be optional for a cloud provider? If not, what is the expected outcome for bare metal deployment or a cloud provider that does not provider "instance host" information via their APIs?

Good question. Yes, the topology label will be optional and the scheduler works as before (spreads among nodes) if the physical host label does not exist.

I think omitting the label if the cloud provider does not support it is reasonable*. There is no plan to hard-code this label into scheduler logic. Rather, the "even pod spreading" KEP would have users put the topologyKey into a new struct of the pod spec.

Also, if you used hostname as physical hostname for bare metal, that would still be semantically correct. 😉

We currently have inter-pod affinity in the scheduler, where users are warned not to use a topology key that isn't supported on the nodes they are using. I see this as an incentive for providers to label their nodes well.

thockin · 2019-04-27T04:20:05Z

keps/sig-cloud-provider/20190423-instance-host-labelling.md

+[zone-struct]: https://github.com/kubernetes/cloud-provider/blob/0a4f4cbb5a664deb4639d7d9bf5bbde3bb3603c1/cloud.go#L208-L211
+
+When the `PhysicalMachineID` is supplied, cloud-controller-manager will use it
+as the value for the `failure-domain.kubernetes.io/physical-host` label.


Why "failure-domain" when the others have moved to "topology" ?

To be fair, we haven't moved to failure-domain yet :P but we should use topology.kubernetes.io assuming #839 is 👍

Intention is to match the others. It's a moving target. (:

thockin

Given what we know today this seems pretty unambiguous, and I am fine to codify this label. I deeply dislike the CloudProvider lib at this point, but that's not your fault and this doesn't make it much worse. :)

I expect one day we'll find physcial hosts that don't fit this mold, and we'll cross that bridge then.

andrewsykim · 2019-04-29T15:18:38Z

Related discussion in the mailing list, in case other folks missed it https://groups.google.com/forum/#!topic/kubernetes-sig-cloud-provider/32N59IYXogY

misterikkit · 2019-04-29T19:17:05Z

I've cleaned up the wording a bit. If there is agreement on the overall approach, then we can merge this KEP and start iterating on the design/implementation. (I'll flatten commits now)

This KEP proposes a new well-defined label for VM instance host as a topology type. In this commit I have only written the summary, motivation, and alternatives section so that we can get agreement on the overall approach before discussing design.

andrewsykim · 2019-04-29T19:26:42Z

/hold

I think this warrants a discussion in the SIG meeting to gauge what other providers think. Can you add it to this week's agenda please?

andrewsykim · 2019-04-29T19:37:27Z

I might have mentioned this in the SIG Cloud Provider mailing list thread but I feel that physical host topology is not any more common than something like rack topology. Should we codify both while we're at it then? If not, can we just go with provider specific labels? Echoing Tim's sentiment around the cloud provider interface, it has organically become a mess. If we can avoid adding methods like GetPhysicalHostByProviderID I would prefer that, the alternative would be folding the existing Get*ByProviderID methods into a generic method like GetTopologyLabelsForNode that only allows a pre-defined set of well-known label keys (zone, region, physical-host, rack, etc). Maybe this is an implementation detail at this point but given the current state of the interface I would like to put more thought into how the interface will change prior to moving forward.

embano1 · 2019-04-29T19:45:41Z

keps/sig-cloud-provider/20190423-physical-host-labelling.md

+
+### Implementation Details/Notes/Constraints [optional]
+
+What are the caveats to the implementation?


Here we should list the challenges of what I used to call "label drift" (see [1] for details), i.e. what happens in clusters where VM placement is dynamic.

Take for example VMware vSphere (applies to other hypervisors as well, incl. Google Cloud with Live Migration I suppose) where various events (manual, host maintenance, failure) can lead to a VM to be relocated. In case of HA (reboot), the node would possible re-lable itself after boot, but usually administrators (or cluster rebalancing) thrive to maintain the desired state (e.g. vSphere DRS affinity/anti-affinity rules), leading to live migration (vMotion) of a VM back to a compliant host. The behavior when/if this happens is non-deterministic which complicates the matter.

tl;dr: advanced hypervisor features like vSphere DRS/vMotion et al, used in almost every on-prem customer environment for improved cluster utilization, fairness and availability, lead to topology.kubernetes.io/physical-host label drift. According to a discussion on SIG Scheduling, the scheduler code might work fine here (cache invalidation/update with new labels <- kind of a relabeler needed to change this node label), but that could invalidate former scheduling decisions (e.g. affinity/anti-affinity).

[1]: vSphere Cloud Provider should support implement Zones() interface #64021

Agreed - this area needs careful consideration. Do you think that should be mentioned in the summary/motivation section, or in the follow up PR where we start fleshing out the design?

I think this warrants discussion now because it may determine the validity of the topology label in the first place

+1 on what @andrewsykim said

Generally speaking, Kubernetes scheduling decisions are valid at the moment they are made. There is no guarantee that they remain valid in the future. For example, Pod B is scheduled on the the same node that runs Pod A when Pod B has inter-pod affinity to Pod A, but a second later Pod A may terminate. Similarly, node labels may be updated/deleted. These changes may invalidate NodeAffinity and NodeSelector constraints of pods.
We have accepted the fact that clusters are dynamic in nature and scheduling decisions may be invalidated in the future. This is an assumption that other new features should make too.
We may one day add "Descheduler" as a standard component to find such invalidations and deschedule pods. Whether we add a descheduler or not, IMO the fact that dynamic cluster changes may invalidate a scheduling decision should not be a blocker for new features.

andrewsykim · 2019-07-22T15:09:03Z

/close

Closing in favor of #1127

k8s-ci-robot · 2019-07-22T15:09:05Z

@andrewsykim: Closed this PR.

In response to this:

/close

Closing in favor of #1127

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from bsalamat and k82cn April 24, 2019 23:13

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Apr 24, 2019

misterikkit mentioned this pull request Apr 24, 2019

Instance Host Labelling #998

Closed

bsalamat reviewed Apr 25, 2019

View reviewed changes

andrewsykim reviewed Apr 25, 2019

View reviewed changes

thockin reviewed Apr 27, 2019

View reviewed changes

Create KEP for physical host labelling

b4fc4a9

This KEP proposes a new well-defined label for VM instance host as a topology type. In this commit I have only written the summary, motivation, and alternatives section so that we can get agreement on the overall approach before discussing design.

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2019

embano1 reviewed Apr 29, 2019

View reviewed changes

alculquicondor mentioned this pull request Jul 4, 2019

Create KEP for Instance Host Labelling #1127

Closed

k8s-ci-robot closed this Jul 22, 2019

davidopp mentioned this pull request Aug 8, 2019

[Feature] Support for anti-affinity/affinity rules for the created machines kubernetes-sigs/cluster-api-provider-vsphere#175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create KEP for instance host labelling #997

Create KEP for instance host labelling #997

misterikkit commented Apr 24, 2019

k8s-ci-robot commented Apr 24, 2019

bsalamat Apr 24, 2019

bsalamat Apr 25, 2019

k82cn Apr 25, 2019

misterikkit Apr 26, 2019

bsalamat commented Apr 25, 2019

andrewsykim left a comment

andrewsykim Apr 25, 2019

bsalamat Apr 25, 2019

misterikkit Apr 26, 2019

bsalamat commented Apr 25, 2019

misterikkit commented Apr 26, 2019

thockin Apr 27, 2019

andrewsykim Apr 29, 2019

misterikkit Apr 29, 2019

thockin left a comment

andrewsykim commented Apr 29, 2019

misterikkit commented Apr 29, 2019

andrewsykim commented Apr 29, 2019

andrewsykim commented Apr 29, 2019 •

edited

Loading

embano1 Apr 29, 2019

misterikkit Apr 29, 2019

andrewsykim Apr 30, 2019

embano1 May 2, 2019

bsalamat May 2, 2019 •

edited

Loading

andrewsykim commented Jul 22, 2019

k8s-ci-robot commented Jul 22, 2019


		### Implementation Details/Notes/Constraints [optional]

		What are the caveats to the implementation?

Create KEP for instance host labelling #997

Create KEP for instance host labelling #997

Conversation

misterikkit commented Apr 24, 2019

k8s-ci-robot commented Apr 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsalamat commented Apr 25, 2019

andrewsykim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsalamat commented Apr 25, 2019

misterikkit commented Apr 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

andrewsykim commented Apr 29, 2019

misterikkit commented Apr 29, 2019

andrewsykim commented Apr 29, 2019

andrewsykim commented Apr 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsalamat May 2, 2019 • edited Loading

Choose a reason for hiding this comment

andrewsykim commented Jul 22, 2019

k8s-ci-robot commented Jul 22, 2019

andrewsykim commented Apr 29, 2019 •

edited

Loading

bsalamat May 2, 2019 •

edited

Loading