-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to see GPU memory consumption as part of system metrics in aim stack. #3020
Comments
Hi @alberttorosyan Thanks for marking this as a bug. |
@dushyantbehl, here's the code snippet which extracts the GPU information before passing it to Aim tracking methods: gpu_info = dict()
handle = nvml.nvmlDeviceGetHandleByIndex(i)
try:
util = nvml.nvmlDeviceGetUtilizationRates(handle)
# GPU utilization percent
gpu_info["gpu"] = round10e5(util.gpu)
except nvml.NVMLError_NotSupported:
pass
try:
# Get device memory
memory = nvml.nvmlDeviceGetMemoryInfo(handle)
# Device memory usage
# 'memory_used': round10e5(memory.used / 1024 / 1024),
gpu_info["gpu_memory_percent"] = round10e5(memory.used * 100 / memory.total)
except nvml.NVMLError_NotSupported:
pass
try:
# Get device temperature
nvml_tmp = nvml.NVML_TEMPERATURE_GPU
temp = nvml.nvmlDeviceGetTemperature(handle, nvml_tmp)
# Device temperature
gpu_info["gpu_temp"] = round10e5(temp)
except nvml.NVMLError_NotSupported:
pass
try:
# Compute power usage in watts and percent
power_watts = nvml.nvmlDeviceGetPowerUsage(handle) / 1000
power_cap = nvml.nvmlDeviceGetEnforcedPowerLimit(handle)
power_cap_watts = power_cap / 1000
power_watts / power_cap_watts * 100
# Power usage in watts and percent
gpu_info["gpu_power_watts"]: round10e5(power_watts)
# gpu_info["power_percent"] = round10e5(power_usage)
except nvml.NVMLError_NotSupported:
pass Each call to |
@alberttorosyan It was not a device support problem since directly using nmvl APIs worked. After some debugging, I found the cause. Have opened a PR here: #3044 |
Fix merged here - #3044 |
❓Question
I have been using aimstack version
3.17.5
and unable to see any GPU memory consumption when doing aimstack runs.The dashboard shows GPU %, GPU temprature but not the memory used. Is there a way to track what is going on?
I am happy to share any information about the environment you may have. Thanks in advance.
The text was updated successfully, but these errors were encountered: