Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] error in fastai NVMLError_LibRmVersionMismatch: RM has detected an NVML/RM version mismatch. #1287

Closed
miguelgfierro opened this issue Jan 22, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@miguelgfierro
Copy link
Collaborator

miguelgfierro commented Jan 22, 2021

Description

error in fastai version

E                 3 from .core import *
E                 4 from .basic_data import *
E                 5 from .data_block import *
E           
E           /anaconda/envs/reco_gpu/lib/python3.6/site-packages/fastai/basic_train.py in <module>
E                 4 from .callback import *
E                 5 from .data_block import *
E           ----> 6 from .utils.mem import gpu_mem_restore
E                 7 import inspect
E                 8 from fastprogress.fastprogress import format_time
E           
E           /anaconda/envs/reco_gpu/lib/python3.6/site-packages/fastai/utils/mem.py in <module>
E                16 
E                17 if use_gpu:
E           ---> 18     pynvml = load_pynvml_env()
E                19 
E                20 def preload_pytorch():
E           
E           /anaconda/envs/reco_gpu/lib/python3.6/site-packages/fastai/utils/pynvml_gate.py in load_pynvml_env()
E                18         return pynvml
E                19 
E           ---> 20     pynvml.nvmlInit()
E                21 
E                22     return pynvml
E           
E           /anaconda/envs/reco_gpu/lib/python3.6/site-packages/pynvml.py in nvmlInit()
E               613     fn = _nvmlGetFunctionPointer("nvmlInit_v2")
E               614     ret = fn()
E           --> 615     _nvmlCheckReturn(ret)
E               616 
E               617     # Atomically update refcount
E           
E           /anaconda/envs/reco_gpu/lib/python3.6/site-packages/pynvml.py in _nvmlCheckReturn(ret)
E               308 def _nvmlCheckReturn(ret):
E               309     if (ret != NVML_SUCCESS):
E           --> 310         raise NVMLError(ret)
E               311     return ret
E               312 
E           
E           NVMLError_LibRmVersionMismatch: RM has detected an NVML/RM version mismatch.


In which platform does it happen?

How do we replicate the issue?

https://dev.azure.com/best-practices/recommenders/_build/results?buildId=43853&view=results

Expected behavior (i.e. solution)

Other Comments

related to #1282 (comment)

@miguelgfierro miguelgfierro added the bug Something isn't working label Jan 22, 2021
@miguelgfierro
Copy link
Collaborator Author

pytest tests/unit/test_notebooks_gpu.py::test_fastai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant