remove nodeAffinity from typha #9826

ozdanborne · 2020-08-27T18:06:40Z

This PR removes the unnecessary nodeAffinity from the Typha deployment in calico and canal manifests.

Some background / history on how we got to this point:

nodeSelector: {kubernetes.io/role: master} is added to Canal in Canal v3.10 manifest for k8s v1.15+ #7917
This nodeSelector change is "copied" into the Calico manifest in Update Calico to v3.10.2 #8104 .
@seh points out that the node selector "makes it much harder to run more than three Typha replicas. It would be better to have node affinity that expresses a preference, but not a requirement for Typha pods to run on the masters." Update Calico to v3.10.2 #8104 (comment), and raises Calico's Typha pods should prefer to run on masters, but tolerate running elsewhere #9608 to track.
Nodeselector is converted to an Affinity as part of Prefer nodes with "master" role for Calico Typha pods #9609

Why it was added in the first place

I believe that the nodeSelector should never have been added in the first place. @tmjd has been asking questions on those threads and as best as we can tell, the node affinity / selectors were added to mitigate an issue where Typha was not being evicted from a node when the node group was being scaled down to zero. As per @seh:

I've seen cases where the cluster autoscaler can't drop a worker node because a Typha pod is running on it. That Typha pod could run on one of the master nodes that are not subject to the autoscaler's control. The safe-to-evict annotation didn't work as expected. We wound up paying to run that worker node unnecessarily. [Note that] I wasn't using kops when I saw that problem with the cluster autoscaler failing to evict Typha's pods.
Source: #9608 (comment)

As per @KashifSaadat

I vaguely recall this was more specific to how we performed Kops node rolling upgrades and manifest updates. I believe as part of a Kops upgrade from v1.14 to v1.15 (including both Kubernetes and the Canal manifest update), after rolling the first master nodes the new Canal manifest was applied and Typha enabled (if specified in the cni spec), but Typha couldn't be scheduled on the old v1.14 nodes. At this point the canal pods have also begun rolling (as the new manifest was applied) and would fail to start because the Typha service is unavailable.
I'd have to re-run the upgrade path to confirm the exact issue I encountered, but fairly certain the nodeSelector should no longer be required any more so we can look to remove it in a future update.
Source: #7917 (comment)

I don't disagree that something may have been funky with evictions. But I do disagree with using affinity to avoid it. We should fix that eviction issue instead! Even after pouring through these issues, It's not clear to me what the eviction problem is, though it sounds like we might not be expecting it anymore?

k8s-ci-robot · 2020-08-27T18:06:49Z

Hi @ozdanborne. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

seh · 2020-08-27T18:26:29Z

I believe the problem lies with the autoscaler's treatment of pod priority, as described in its FAQ. The priority cutoff was zero as recently as version 1.11, but they've since dropped it to -10. I suspect that the Typha pods hadn't been preventing scale downs because their priority was zero. Now that their priority is significantly higher, well above the default -10 cutoff, they prevent the autoscaler from scaling down.

The "cluster-autoscaler.kubernetes.io/safe-to-evict" annotation is supposed to allow overriding that policy, so long as the Typha pods don't violate any of the other requirements expressed here. In some circumstances we can't move a Typha pod due its use of host ports, precluding multiple running on the same node. We should be able to move them back onto master machines, though.

As for why I had seen these pods keeping worker nodes alive when it appeared the pods could have run elsewhere instead, I can't go back to recover that environment today to investigate further.

rifelpet · 2020-08-27T18:45:13Z

/ok-to-test

hakman · 2020-08-28T07:13:39Z

@ozdanborne Thanks for doing this. I agree with the approach. Let's see what happens when this is removed and fix it. Btw, nice job with Calico 3.16. Looking forward to give it a try!
/lgtm
/approve

k8s-ci-robot · 2020-08-28T07:13:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman, ozdanborne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hakman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

remove nodeAffinity from typha

6ae4337

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2020

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 27, 2020

k8s-ci-robot requested review from mikesplain and olemarkus August 27, 2020 18:07

k8s-ci-robot added the area/addons label Aug 27, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2020

k8s-ci-robot assigned hakman Aug 28, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2020

k8s-ci-robot merged commit 5e0c55b into kubernetes:master Aug 28, 2020

k8s-ci-robot added this to the v1.19 milestone Aug 28, 2020

ozdanborne deleted the remove-typha-affinity branch September 1, 2020 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove nodeAffinity from typha #9826

remove nodeAffinity from typha #9826

ozdanborne commented Aug 27, 2020

k8s-ci-robot commented Aug 27, 2020

seh commented Aug 27, 2020 •

edited

Loading

rifelpet commented Aug 27, 2020

hakman commented Aug 28, 2020

k8s-ci-robot commented Aug 28, 2020

remove nodeAffinity from typha #9826

remove nodeAffinity from typha #9826

Conversation

ozdanborne commented Aug 27, 2020

k8s-ci-robot commented Aug 27, 2020

seh commented Aug 27, 2020 • edited Loading

rifelpet commented Aug 27, 2020

hakman commented Aug 28, 2020

k8s-ci-robot commented Aug 28, 2020

seh commented Aug 27, 2020 •

edited

Loading