-
Notifications
You must be signed in to change notification settings - Fork 156
Issues: NVIDIA/dcgm-exporter
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Helm templates not getting populated when built from source
bug
Something isn't working
#405
opened Oct 23, 2024 by
Indresh2410
Maintain uniformity with helm chat and static yaml's
bug
Something isn't working
#402
opened Oct 22, 2024 by
Indresh2410
I want to see how many GPU cores have been allocated to each container through metrics.
enhancement
New feature or request
#399
opened Oct 12, 2024 by
changhyuni
can not collect gpu utilization metric when mig enable for some pods
bug
Something isn't working
#397
opened Oct 8, 2024 by
melikeiremguler
dcgm-exporter daemonset Startup error Failed to pass the health check
question
Further information is requested
#393
opened Sep 26, 2024 by
guoliangmiao
In the case of gpu pass-through, does dcgm-exporter on the physical host support capturing gpu metrics of kvm virtual machines?
question
Further information is requested
#392
opened Sep 21, 2024 by
lddlww
DCGM Exporter in EKS p4d.24xlarge instance type controller error
bug
Something isn't working
#387
opened Sep 5, 2024 by
camilopaezrios
DCGM Exporter in EKS p4d.24xlarge instance type controller error
#386
opened Sep 5, 2024 by
camilopaezrios
DCGM-exporter pods stuck in Running State, Not getting Ready without GPU allocation.
question
Further information is requested
#385
opened Sep 3, 2024 by
rohitreddy1698
Add a health status metric for every gpu card
question
Further information is requested
#384
opened Aug 30, 2024 by
lx1036
Error with "make binary" operation in local development
bug
Something isn't working
#381
opened Aug 30, 2024 by
kschoi93
No DCGM_FI_DEV_FB_FREE reported for MIG-enabled GPUs
bug
Something isn't working
#380
opened Aug 28, 2024 by
george-kuanli-peng
Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
question
Further information is requested
#379
opened Aug 23, 2024 by
Vijaygawate
failed to transform metrics for transform 'podMapper'
bug
Something isn't working
#378
opened Aug 21, 2024 by
jicki
How does dcgm-exporter, when running on k8s as a daemonset, communicate with the host's dcgm host engine?
question
Further information is requested
#377
opened Aug 19, 2024 by
yx-lamini
The pod and namespace information in the monitoring indicators of some Gpus occupied by Pods is empty
bug
Something isn't working
#373
opened Aug 14, 2024 by
qingfenghcy
time="2024-08-08T03:09:05Z" level=error msg="Failed to write response." error="write tcp 10.202.3.1:9400->10.202.2.2:49674: i/o timeout
bug
Something isn't working
#372
opened Aug 8, 2024 by
safeAndSound3
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
question
Further information is requested
#371
opened Aug 2, 2024 by
15234660879
MIG device support for hpc_job metric labels
enhancement
New feature or request
#369
opened Jul 30, 2024 by
jbrobstw
Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
question
Further information is requested
#368
opened Jul 30, 2024 by
15234660879
Let dcgm-exporter be a daemon
enhancement
New feature or request
#367
opened Jul 25, 2024 by
zvonkok
Previous Next
ProTip!
Follow long discussions with comments:>50.