-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gpu_id #59
Fix gpu_id #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not test it.
src/br/analysis/prereq.py
Outdated
# Based on the utilization, set the GPU ID | ||
|
||
|
||
def get_gpu_info(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should use something like this instead: https://github.com/anderskm/gputil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this! I believe it would be beneficial to update it for the MIG partitioned GPUs as well. Additionally, I noticed that they are leveraging nvidia-smi --query-gpu
to retrieve the statistics, which we are also using. The repo you shared offers a lot of stats about the GPUs, and I think incorporating that could be useful. I'll talk to ritvik about this!
this PR seems to be doing a lot more than this? did you mean to merge into the other nb-> python file branch? |
I had forked from this branch and updated the _setup_gpu function in the scripts! |
can you merge into that branch then so its easier to know what you added? |
I should have clarified here I didn't mean to drop this PR and just merge your changes, but to just base your PR off that branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, thanks!
This Pull Request enhances the GPU selection mechanism based on memory utilization. Specifically, it implements the following changes:
GPU ID Selection:
Limitations:
nvidia-smi
does support a MIG option, I currently do not have the necessary permissions to access this feature.dcgmi
. However, this tool is not installed and cannot be utilized.Current Approach:
nvidia-smi -L
is performed to select the appropriate GPU IDs.