-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduling with update from v1.2.0 to v1.4.0 #1775
Comments
/assign @Thor-wl |
Well, pls give more details about your testing steps so that I can reproduce it. THX. |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
Hi @Thor-wl. I am able to reproduce this issue using a GKE Kubernetes cluster with autoscaling enabled. Creating a podgroup that can't be satisfied with current resources is enough. Prior to v1.4.0, Not sure on how to reproduce it locally, but we have investigated it further on our side. It happens after this PR. With this, volcano has started putting custom reasons like I tested it by reverting the PR on top of v1.5.0-beta and autoscaling worked as before. I'd appreciate any help on solving this in volcano. Both autoscaling and batch-scheduling is important to our setup. |
Hello we are having the same issues and would appreciate if there's an update on this issue. |
Thanks, guys. Let me take a look at that. |
Could we re-open and make an update here? Volcano is pretty much unusable with Cluster Autoscaler and Karpenter with the "Undetermined" reason. Is there any reason why we shouldnt revert the PR to gain back compatibility with the autoscaling\cloud eco-system? Would love to hear from the team on this. |
@brickyard Of course, please update here. Maybe the scheduling reason enhanced in pr#1672 missed to consider the interacting between scheduler and the autoscaler. @Thor-wl please continue to work on this to fix it. if there is not a way to take care of both the autoscaling and scheduler reason enhancement. We need to revert to keep the compatibility firstly. |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗 |
This issue is still affecting Karpenter users. Can we re-open and find a way to set the pod status to |
I think there should be no issues with reverting #1672, as the intent was to provide more information to the user. But if it breaks compatibility with cluster autoscalers, that seems like a very steep price to pay for better logging. Maybe this PR could be re-submitted by just annotating the status message with this info, instead of changing the |
What happened:
volcano update from 1.2.0 to 1.4.0. With the newest version if there are not enough resources
PodGroup
s are kept inPending
phase and cluster autoscaler does not trigger to provision more resources.Did I miss smth in the latest version?
What you expected to happen:
I was expecting it to work as in 1.2.0
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): 1.22.2uname -a
):The text was updated successfully, but these errors were encountered: