You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We list some common troubles faced by many users and their corresponding solutions here.
Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them.
Installation
"No module named 'mmcv.ops'"; "No module named 'mmcv._ext'"
Uninstall existing mmcv in the environment using pip uninstall mmcv
"invalid device function" or "no kernel image is available for execution"
Check the CUDA compute capability of you GPU
Run python mmdet/utils/collect_env.py to check whether PyTorch, torchvision, and MMCV are built for the correct GPU architecture. You may need to set TORCH_CUDA_ARCH_LIST to reinstall MMCV. The compatibility issue could happen when using old GPUS, e.g., Tesla K80 (3.7) on colab.
Check whether the running environment is the same as that when mmcv/mmdet is compiled. For example, you may compile mmcv using CUDA 10.0 bug run it on CUDA9.0 environments
"undefined symbol" or "cannot open xxx.so"
If those symbols are CUDA/C++ symbols (e.g., libcudart.so or GLIBCXX), check
whether the CUDA/GCC runtimes are the same as those used for compiling mmcv
If those symbols are Pytorch symbols (e.g., symbols containing caffe, aten, and TH), check whether the Pytorch version is the same as that used for compiling mmcv
Run python mmdet/utils/collect_env.py to check whether PyTorch, torchvision, and MMCV are built by and running on the same environment
"RuntimeError: CUDA error: invalid configuration argument"
This error may be caused by the poor performance of GPU. Try to decrease the value of THREADS_PER_BLOCK
and recompile mmcv.
"RuntimeError: nms is not compiled with GPU support"
This error is because your CUDA environment is not installed correctly.
You may try to re-install your CUDA environment and then delete the build/ folder before re-compile mmcv.
"Segmentation fault"
Check your GCC version and use GCC >= 5.4. This usually caused by the incompatibility between PyTorch and the environment (e.g., GCC < 4.9 for PyTorch). We also recommend the users to avoid using GCC 5.5 because many feedbacks report that GCC 5.5 will cause "segmentation fault" and simply changing it to GCC 5.4 could solve the problem
Check whether PyTorch is correctly installed and could use CUDA op, e.g. type the following command in your terminal and see whether they could correctly output results
If PyTorch is correctly installed, check whether MMCV is correctly installed. If MMCV is correctly installed, then there will be no issue of the command
python -c 'import mmcv; import mmcv.ops'
If MMCV and PyTorch are correctly installed, you can use ipdb to set breakpoints or directly add print to debug and see which part leads the segmentation fault
"libtorch_cuda_cu.so: cannot open shared object file"
mmcv-full depends on the share object but it can not be found. We can check whether the object exists in ~/miniconda3/envs/{environment-name}/lib/python3.7/site-packages/torch/lib or try to re-install the PyTorch.
"fatal error C1189: #error: -- unsupported Microsoft Visual Studio version!"
If you are building mmcv-full on Windows and the version of CUDA is 9.2, you will probably encounter the error "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include\crt/host_config.h(133): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!", in which case you can use a lower version of Microsoft Visual Studio like vs2017.
"error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized"
If your version of PyTorch is 1.5.0 and you are building mmcv-full on Windows, you will probably encounter the error - torch/csrc/jit/api/module.h(474): error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized. The way to solve the error is to replace all the static constexpr bool all_slots = false; with static bool all_slots = false; at this file https://github.com/pytorch/pytorch/blob/v1.5.0/torch/csrc/jit/api/module.h. More details can be found at member "torch::jit::detail::AttributePolicy::all_slots" may not be initialized pytorch/pytorch#39394.
"error: a member with an in-class initializer must be const"
If your version of PyTorch is 1.6.0 and you are building mmcv-full on Windows, you will probably encounter the error "- torch/include\torch/csrc/jit/api/module.h(483): error: a member with an in-class initializer must be const". The way to solve the error is to replace all the CONSTEXPR_EXCEPT_WIN_CUDA with const at torch/include\torch/csrc/jit/api/module.h. More details can be found at Can't install mmcv-full. Reported 'Ninja: build stopped: subcommand failed' #575.
"error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized"
If your version of PyTorch is 1.7.0 and you are building mmcv-full on Windows, you will probably encounter the error torch/include\torch/csrc/jit/ir/ir.h(1347): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized. The way to solve the error needs to modify several local files of PyTorch:
delete static constexpr Symbol Kind = ::c10::prim::profile; and tatic constexpr Symbol Kind = ::c10::prim::profile_optional; at torch/include\torch/csrc/jit/ir/ir.h
replace explicit operator type&() { return *(this->value); } with explicit operator type&() { return *((type*)this->value); } at torch\include\pybind11\cast.h
replace all the CONSTEXPR_EXCEPT_WIN_CUDA with const at torch/include\torch/csrc/jit/api/module.h
Compatibility issue between MMCV and MMDetection; "ConvWS is already registered in conv layer"
Frequently Asked Questions
We list some common troubles faced by many users and their corresponding solutions here.
Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them.
Installation
"No module named 'mmcv.ops'"; "No module named 'mmcv._ext'"
pip uninstall mmcv
"invalid device function" or "no kernel image is available for execution"
python mmdet/utils/collect_env.py
to check whether PyTorch, torchvision, and MMCV are built for the correct GPU architecture. You may need to setTORCH_CUDA_ARCH_LIST
to reinstall MMCV. The compatibility issue could happen when using old GPUS, e.g., Tesla K80 (3.7) on colab."undefined symbol" or "cannot open xxx.so"
whether the CUDA/GCC runtimes are the same as those used for compiling mmcv
python mmdet/utils/collect_env.py
to check whether PyTorch, torchvision, and MMCV are built by and running on the same environment"RuntimeError: CUDA error: invalid configuration argument"
This error may be caused by the poor performance of GPU. Try to decrease the value of THREADS_PER_BLOCK
and recompile mmcv.
"RuntimeError: nms is not compiled with GPU support"
This error is because your CUDA environment is not installed correctly.
You may try to re-install your CUDA environment and then delete the build/ folder before re-compile mmcv.
"Segmentation fault"
python -c 'import torch; print(torch.cuda.is_available())'
python -c 'import mmcv; import mmcv.ops'
ipdb
to set breakpoints or directly addprint
to debug and see which part leads thesegmentation fault
"libtorch_cuda_cu.so: cannot open shared object file"
mmcv-full
depends on the share object but it can not be found. We can check whether the object exists in~/miniconda3/envs/{environment-name}/lib/python3.7/site-packages/torch/lib
or try to re-install the PyTorch."fatal error C1189: #error: -- unsupported Microsoft Visual Studio version!"
If you are building mmcv-full on Windows and the version of CUDA is 9.2, you will probably encounter the error
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include\crt/host_config.h(133): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported!"
, in which case you can use a lower version of Microsoft Visual Studio like vs2017."error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized"
If your version of PyTorch is 1.5.0 and you are building mmcv-full on Windows, you will probably encounter the error
- torch/csrc/jit/api/module.h(474): error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized
. The way to solve the error is to replace all thestatic constexpr bool all_slots = false;
withstatic bool all_slots = false;
at this filehttps://github.com/pytorch/pytorch/blob/v1.5.0/torch/csrc/jit/api/module.h
. More details can be found at member "torch::jit::detail::AttributePolicy::all_slots" may not be initialized pytorch/pytorch#39394."error: a member with an in-class initializer must be const"
If your version of PyTorch is 1.6.0 and you are building mmcv-full on Windows, you will probably encounter the error
"- torch/include\torch/csrc/jit/api/module.h(483): error: a member with an in-class initializer must be const"
. The way to solve the error is to replace all theCONSTEXPR_EXCEPT_WIN_CUDA
withconst
attorch/include\torch/csrc/jit/api/module.h
. More details can be found at Can't install mmcv-full. Reported 'Ninja: build stopped: subcommand failed' #575."error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized"
If your version of PyTorch is 1.7.0 and you are building mmcv-full on Windows, you will probably encounter the error
torch/include\torch/csrc/jit/ir/ir.h(1347): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized
. The way to solve the error needs to modify several local files of PyTorch:static constexpr Symbol Kind = ::c10::prim::profile;
andtatic constexpr Symbol Kind = ::c10::prim::profile_optional;
attorch/include\torch/csrc/jit/ir/ir.h
explicit operator type&() { return *(this->value); }
withexplicit operator type&() { return *((type*)this->value); }
attorch\include\pybind11\cast.h
CONSTEXPR_EXCEPT_WIN_CUDA
withconst
attorch/include\torch/csrc/jit/api/module.h
Compatibility issue between MMCV and MMDetection; "ConvWS is already registered in conv layer"
Please install the correct version of MMCV for the version of your MMDetection following the installation instruction. More details can be found at [cpp-extensions] Ensure default extra_compile_args pytorch/pytorch#45956.
Usage
KeyError: "xxx: 'yyy is not in the zzz registry'"
The registry mechanism will be triggered only when the file of the module is imported.
So you need to import that file somewhere. More details can be found at KeyError: "MaskRCNN: 'RefineRoIHead is not in the models registry'" mmdetection#5974.
"RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one"
find_unused_parameters = True
in the config to solve the above problems or find those unused parameters manually"RuntimeError: Trying to backward through the graph a second time"
GradientCumulativeOptimizerHook
andOptimizerHook
are both set which causes theloss.backward()
to be called twice soRuntimeError
was raised. We can only use one of these. More datails at RuntimeError: Trying to backward through the graph a second time #1379.The text was updated successfully, but these errors were encountered: