-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] RoITransformer CUDA error: an illegal memory access was encountered #340
Comments
Could you please try:
and update your error report here. |
Although I DID add CUDA_LAUNCH_BLOCKING=1 on the first place, I tried your given command, and it outputs like this. Traceback (most recent call last): Aborted (core dumped) |
What command do you use to install the |
pip install mmcv-full |
Please uninstall it first, and use mim to install mmcv-full:
Or you need to specify version of mmcv-full by yourself:
|
Successfully installed mmcv-full-1.5.2. Problem remains unsolved. Traceback (most recent call last): Aborted (core dumped) |
What command do you use to install the mmcv-full? The mmcv-full 1.5.2 have many compiled version, it looks like you install the error version. |
pip install openmim I will try to specify version of mmcv-full myself. After that I will update my results here. |
Installed mmcv-full 1.5.3 with command above. Second time, same bug occured. Aborted (core dumped) Uninstalled mmcv-full with command I literally do not understand... |
pip install mmcv-full
…------------------ 原始邮件 ------------------
发件人: ***@***.***>;
发送时间: 2022年6月9日(星期四) 晚上11:13
收件人: "open-mmlab/mmrotate";
抄送: "Author";
主题: Re: [open-mmlab/mmrotate] [BUG] RoITransformer CUDA error: an illegal memory access was encountered (Issue #340)
What command do you use to install the mmcv-full?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Describe the bug
python tools/train.py 'configs/roi_trans/roi_trans_r50_fpn_1x_dota_le90.py'
Environment
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1: GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.1, V11.1.105
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.1
OpenCV: 4.5.4
MMCV: 1.5.2
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMRotate: 0.3.0+
Error traceback
Traceback (most recent call last):
File "tools/train.py", line 294, in
main()
File "tools/train.py", line 288, in main
meta=meta)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/apis/train.py", line 156, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 109, in new_func
return old_func(*args, **kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/detectors/two_stage.py", line 150, in forward_train
**kwargs)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_trans_roi_head.py", line 238, in forward_train
rcnn_train_cfg)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_trans_roi_head.py", line 155, in _bbox_forward_train
bbox_results = self._bbox_forward(stage, x, rois)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_trans_roi_head.py", line 126, in _bbox_forward
rois)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 197, in new_func
return old_func(*args, **kwargs)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_extractors/rotate_single_level_roi_extractor.py", line 133, in forward
roi_feats_t = self.roi_layers[i](feats[i], rois)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/ops/roi_align_rotated.py", line 171, in forward
self.clockwise)
File "/opt/conda/envs/mmrotate/lib/python3.7/site-packages/mmcv/ops/roi_align_rotated.py", line 70, in forward
clockwise=ctx.clockwise)
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1634272178570/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f20a94ffd62 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1c613 (0x7f2100dee613 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x1a2 (0x7f2100def022 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f20a94e9314 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x294dd9 (0x7f217ee66dd9 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0xae2f59 (0x7f217f6b4f59 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object) + 0x2b9 (0x7f217f6b5279 in /opt/conda/envs/mmrotate/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #24: __libc_start_main + 0xe7 (0x7f21ba3cdbf7 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: