Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove nodeAffinity from typha #9826

Merged
merged 1 commit into from
Aug 28, 2020

Conversation

ozdanborne
Copy link
Contributor

This PR removes the unnecessary nodeAffinity from the Typha deployment in calico and canal manifests.

Some background / history on how we got to this point:

  1. nodeSelector: {kubernetes.io/role: master} is added to Canal in Canal v3.10 manifest for k8s v1.15+ #7917
  2. This nodeSelector change is "copied" into the Calico manifest in Update Calico to v3.10.2 #8104 .
  3. @seh points out that the node selector "makes it much harder to run more than three Typha replicas. It would be better to have node affinity that expresses a preference, but not a requirement for Typha pods to run on the masters." Update Calico to v3.10.2 #8104 (comment), and raises Calico's Typha pods should prefer to run on masters, but tolerate running elsewhere #9608 to track.
  4. Nodeselector is converted to an Affinity as part of Prefer nodes with "master" role for Calico Typha pods #9609

Why it was added in the first place

I believe that the nodeSelector should never have been added in the first place. @tmjd has been asking questions on those threads and as best as we can tell, the node affinity / selectors were added to mitigate an issue where Typha was not being evicted from a node when the node group was being scaled down to zero. As per @seh:

I've seen cases where the cluster autoscaler can't drop a worker node because a Typha pod is running on it. That Typha pod could run on one of the master nodes that are not subject to the autoscaler's control. The safe-to-evict annotation didn't work as expected. We wound up paying to run that worker node unnecessarily. [Note that] I wasn't using kops when I saw that problem with the cluster autoscaler failing to evict Typha's pods.
Source: #9608 (comment)

As per @KashifSaadat

I vaguely recall this was more specific to how we performed Kops node rolling upgrades and manifest updates. I believe as part of a Kops upgrade from v1.14 to v1.15 (including both Kubernetes and the Canal manifest update), after rolling the first master nodes the new Canal manifest was applied and Typha enabled (if specified in the cni spec), but Typha couldn't be scheduled on the old v1.14 nodes. At this point the canal pods have also begun rolling (as the new manifest was applied) and would fail to start because the Typha service is unavailable.
I'd have to re-run the upgrade path to confirm the exact issue I encountered, but fairly certain the nodeSelector should no longer be required any more so we can look to remove it in a future update.
Source: #7917 (comment)

I don't disagree that something may have been funky with evictions. But I do disagree with using affinity to avoid it. We should fix that eviction issue instead! Even after pouring through these issues, It's not clear to me what the eviction problem is, though it sounds like we might not be expecting it anymore?

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @ozdanborne. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 27, 2020
@seh
Copy link
Contributor

seh commented Aug 27, 2020

I believe the problem lies with the autoscaler's treatment of pod priority, as described in its FAQ. The priority cutoff was zero as recently as version 1.11, but they've since dropped it to -10. I suspect that the Typha pods hadn't been preventing scale downs because their priority was zero. Now that their priority is significantly higher, well above the default -10 cutoff, they prevent the autoscaler from scaling down.

The "cluster-autoscaler.kubernetes.io/safe-to-evict" annotation is supposed to allow overriding that policy, so long as the Typha pods don't violate any of the other requirements expressed here. In some circumstances we can't move a Typha pod due its use of host ports, precluding multiple running on the same node. We should be able to move them back onto master machines, though.

As for why I had seen these pods keeping worker nodes alive when it appeared the pods could have run elsewhere instead, I can't go back to recover that environment today to investigate further.

@rifelpet
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2020
@hakman
Copy link
Member

hakman commented Aug 28, 2020

@ozdanborne Thanks for doing this. I agree with the approach. Let's see what happens when this is removed and fix it. Btw, nice job with Calico 3.16. Looking forward to give it a try!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman, ozdanborne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2020
@k8s-ci-robot k8s-ci-robot merged commit 5e0c55b into kubernetes:master Aug 28, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Aug 28, 2020
@ozdanborne ozdanborne deleted the remove-typha-affinity branch September 1, 2020 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants