-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on calling nvidia-smi: Command 'ps ...' returned non-zero exit status 1 #16
Comments
Hello, It seems that even though it contains f66cffd, still |
hey, thanks for your reply. here are two commands's output
still dont figure out what happend |
Thanks for your information. I think #12 has the similar cause --- However, our recent patch didn't work. |
hey, I figure out why lead this bug and fixed it just now.
this inspired me to run
i guess that the reason is this machine which i am using is a multi-user gpu server, i can not see the pid which belong to other user without |
Hi @feiwofeifeixiaowo, thanks for the information! On a multi-user server, we can or can't get other users' process information from time to time. But I don't know when and why. Can you please try it again with nvidia-smi daemon running, e.g. |
nvidia-smi daemon must be run with root privilege.
…On Fri, Aug 18, 2017 at 11:52 PM Jongwook Choi ***@***.***> wrote:
Hi @feiwofeifeixiaowo <https://github.com/feiwofeifeixiaowo>, thanks for
the information! On a multi-user server, we can or can't get other users'
process information from time to time. But I don't know when and why. Can
you please try it again with nvidia-smi daemon running, e.g. sudo
nvidia-smi daemon (but without sudo)?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEWFDOSdJiM3eb-PL58KDEyla9g_-3lQks5sZbNUgaJpZM4O1VXM>
.
|
That's true. I was just wondering if either of pynvml APIs or nvidia-smi daemon could retrieve such information without root privileges. |
@wookayin the return msg show as below with two commands
|
@feiwofeifeixiaowo Could you please reproduce that the bug still happens on previous versions you were trying with, and that it is now (hopefully) resolved since #20 is merged to master? |
@wookayin , Hello wookayin, it seems that we still got same issue. ➜ ~ gpustat -v
➜ ~ gpustat
➜ ~ nvidia-smi
➜ ~ sudo nvidia-smi
➜ ~ gpustat
|
Thanks for your update. I want to have the same environment for myself, but I don't think I have (maybe I have to mock and simulate it). Could you please provide me a stacktrace information by adding |
sorry about that,
|
Hi, @wookayin ,maybe this bug has gone with power outages a few hours ago. ^_^! After suddenly shutdown with my server, i find that
|
@feiwofeifeixiaowo I googled a lot this problem and it's likely caused by the broken context of some CUDA applications, thus @wookayin we have to deal with this bug even in our Lines 218 to 230 in 895e1f8
|
@Stonesjtu You are absolutely correct. Thanks for the detailed information on why it happens -- I think this issue can be closed now. @feiwofeifeixiaowo Can you please double-check on this? Thanks all! |
I assume that this is now fixed by v0.4.0. Please re-open it or open a new one if you have any problems on this. |
@wookayin @Stonesjtu
How can I fix it? |
I think this line of code |
Released as v0.4.1. |
got above error msg when i run gpustat. but nvidia-smi works on my machine
here are some details
OS:Ubuntu 14.04.5 LTS
Python Version: anaconda3.6
how can i fix this ?
The text was updated successfully, but these errors were encountered: