Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-autoscaler: vendored scheduler dependency must be updated ASAP #3224

Closed
aermakov-zalando opened this issue Jun 15, 2020 · 6 comments
Closed
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@aermakov-zalando
Copy link
Contributor

aermakov-zalando commented Jun 15, 2020

In kubernetes/kubernetes#89222, the scheduler algorithm was slightly tweaked to consider init containers when calculating how many resources a pod takes on a node. This went into at least 1.17.6 (which we currently run) and I guess the corresponding 1.18.x/1.19.x versions. CA has the scheduler as a vendored dependency and there was no corresponding release, which means that every cluster running Kubernetes 1.17.6 can easily go into a situation where CA won't scale up even though the pods can't be scheduled. This happens if you have pods whose initContainer requests are bigger than their main container requests; the scheduler in this case will refuse to schedule but CA will also not do anything since it'll think that a pod could be scheduled.
Other than updating the dependencies, I'd suggest collaborating more closely with sig-scheduling, because any changes in the scheduler might affect CA and they might need to be released in lockstep.

@MaciekPytel
Copy link
Contributor

Thanks for bringing this up. I'm discussing with sig-scheduling how to handle this.
It's should be a straightforward dependency bump for all versions except 1.17. 1.17 CA relies on a commit that is not on 1.17 branch in k/k, so we can't update dependencies there. We're looking to see if we can cherry-pick the required changes on 1.17 branch in k/k, but that will probably take a while.

@aermakov-zalando
Copy link
Contributor Author

We run our own fork (still based on 1.12.2) and we already fixed it there by just patching the vendored dependency, so it's not very relevant for us. You might want consider doing the same in CA until you can update the dependencies properly so it works for other users.

@MaciekPytel
Copy link
Contributor

1.18.2 and 1.17.3 should have fix for this issue.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2020
@yurrriq
Copy link

yurrriq commented Nov 17, 2020

This can be closed now, yeah?

@MaciekPytel
Copy link
Contributor

Correct. Sorry, I forgot to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants