AMP will crash with non-tensorcore GPUs #528

FabianIsensee · 2019-10-07T13:28:36Z

Hi there,
I updated apex today (pulled from github) and now I am getting error when running mixed precision training on GPUs that don't have tensorcores. The following snipped comes from running a 2D U-Net on a TitanXp GPU:

RuntimeError: CUDA error: no kernel image is available for execution on the device (multi_tensor_apply at csrc/multi_tensor_apply.cuh:104)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x2b148356f543 in /home/isensee/dl_venv_new/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: void multi_tensor_apply<2, ScaleFunctor<float, float>, float>(int, int, at::Tensor const&, std::vector<std::vector<at::Tensor, std::allocatorat::Tensor >, std::allocator<std::vector<at::Tensor, std::allocatorat::Tensor > > > const&, ScaleFunctor<float, float>, float) + 0xba6 (0x2b149e4db0f6 in /home/isensee/dl_venv_new/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: multi_tensor_scale_cuda(int, at::Tensor, std::vector<std::vector<at::Tensor, std::allocatorat::Tensor >, std::allocator<std::vector<at::Tensor, std::allocatorat::Tensor > > >, float) + 0xa90 (0x2b149e4d8c50 in /home/isensee/dl_venv_new/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: + 0x229b7 (0x2b149e4ca9b7 in /home/isensee/dl_venv_new/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/amp_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: + 0x1d5af (0x2b149e4c55af in /home/isensee/dl_venv_new/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/amp_C.cpython-37m-x86_64-linux-gnu.so)

frame #49: __libc_start_main + 0xf5 (0x2b1416258c05 in /lib64/libc.so.6)
frame #50: python3() [0x400721]

Any idea what's going on?
Best,
Fabian

FabianIsensee · 2019-10-07T13:29:25Z

I built apex with

python setup.py install --cuda_ext --cpp_ext

FabianIsensee · 2019-10-07T15:15:01Z

I have now done several experiments. First of all, I changed my installation command to the one from the readme:

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Second, the error above appears on different GPUs depending on where I built apex. I am running apex on our GPU cluster, which has different types of GPUs: RTX2080ti, TitanXp and V100. All software is shared between the nodes in the cluster, so if I compile and install apex on one node, all nodes will have that version.

Here is what I found:

If I build and install apex on a RTX2080ti node, it will work on RTX2080ti cards but not in TitanXp or V100
If I build and install apex on a V100 node, it will work on RTX2080ti and V100, but not on TitanXp
If I build and install apex on a TitanXp node, it will work on TitanXp, but not on RTX2080ti or V100

Has it always been like this? I cannot remember having any problems in the past.

gcc version is 7.2.0, cuda version is 10.0, pytorch is the most recent nightly.
I would very much appreciate your help!

Best,
Fabian

Edit: The whole problem does not appear if I do a python-only installation. This will then of course give some warning:

Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")

And I get a small performance penalty

mcarilli · 2019-10-07T16:10:17Z

This is an artifact of some recent changes to how Pytorch builds extensions:
pytorch/pytorch#23408
If the environment variable TORCH_CUDA_ARCH_LIST is not set, Pytorch will build extensions for the architecture on the node where you are compiling (e.g. if you are compiling on a node with V100, it will compile for Volta, which will work for Volta and probably for Turing as well). Apex is set up to respect this logic, unless you are building on a system with no GPUs, in which case Apex sets TORCH_CUDA_ARCH_LIST to build for all compute capabilities from Pascal through Turing.

In your case, if you want a single build that works for Titan Xp (compute capability 6.1), V100 (cc 7.0), and RTX2080Ti (cc 7.5), you can

$ pip uninstall apex # repeat if multiple installations occurred by accident
$ export TORCH_CUDA_ARCH_LIST="6.1;7.0;7.5"
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

FabianIsensee · 2019-10-07T16:15:36Z

Outstanding, thank you!

MrRobot2211 · 2019-12-27T22:26:19Z

Thank you.

ethanjperez · 2020-07-06T21:36:22Z

@mcarilli It might be worth advertising this fact/fix in the README

mcarilli closed this as completed Oct 7, 2019

matlabninja mentioned this issue May 20, 2020

Use O1 opt_lv leads to RuntimeError: CUDA error: no kernel image is available for execution on the device #842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMP will crash with non-tensorcore GPUs #528

AMP will crash with non-tensorcore GPUs #528

FabianIsensee commented Oct 7, 2019

FabianIsensee commented Oct 7, 2019

FabianIsensee commented Oct 7, 2019 •

edited

Loading

mcarilli commented Oct 7, 2019 •

edited

Loading

FabianIsensee commented Oct 7, 2019

MrRobot2211 commented Dec 27, 2019

ethanjperez commented Jul 6, 2020

AMP will crash with non-tensorcore GPUs #528

AMP will crash with non-tensorcore GPUs #528

Comments

FabianIsensee commented Oct 7, 2019

FabianIsensee commented Oct 7, 2019

FabianIsensee commented Oct 7, 2019 • edited Loading

mcarilli commented Oct 7, 2019 • edited Loading

FabianIsensee commented Oct 7, 2019

MrRobot2211 commented Dec 27, 2019

ethanjperez commented Jul 6, 2020

FabianIsensee commented Oct 7, 2019 •

edited

Loading

mcarilli commented Oct 7, 2019 •

edited

Loading