-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize log output when resources are not enough #2993
Comments
Hello @hwdef, I'd like to inquire if I could select this particular issue as my initial task for contributing to Volcano: Is it feasible to determine the type of resource scarcity causing a job blockage? One potential approach could involve quering the Kubernetes cluster for available memory and CPU. By comparing these metrics with the job's resource allocation requests, we could potentially pinpoint the source of failure in resource allocation. Your guidance on whether this is an appropriate first task would be greatly appreciated. |
Yes you can fix this, our goal is that the error prompts in case of insufficient resources are consistent with the native scheduler of kubernetes. |
@hwdef Can you please confirm if my approach is correct or not |
By and large this is correct, but I don't think you need to actively query the k8s for available cpu, memory remaining, volcano should already be maintaining these in cache. And we don't just need to know the cpu and memory ones, there are also extended resources such as nvidia.com/gpu |
Ohk got it, |
/assign |
Hi @srikanth-iyengar , you can refer to the following display, deployment with 2 pods. |
Thanks for the info @lowang-bh |
Hey @hwdef and @lowang-bh, could one of you provide a heads-up if everything is looking good in the PR? |
/assign |
What would you like to be added:
Optimize the scheduler's logs when resources are low by printing exactly what kind of resources are low.
Why is this needed:
easier to identify the reasons for insufficient resources.
kubernetes scheduler VS volcano
kubernetese scheduler:
volcano:
volcano just print
NotEnoughResources
The text was updated successfully, but these errors were encountered: