Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError #23

Open
wuzhi19931128 opened this issue Jan 14, 2020 · 0 comments
Open

RuntimeError #23

wuzhi19931128 opened this issue Jan 14, 2020 · 0 comments

Comments

@wuzhi19931128
Copy link

您好,
在Imagenet.py 读取数据时直接出现了 an illegal memory 的错误,请问是什么原因呢?我的显卡是2 * V100,应该不会出现显存不足的错误呀,源码除了数据集位置没有做任何改变,
以下是错误日志

root@test-6gwz28fvc:/data1/test# python imagenet.py
DALI "gpu" variant
read 1281167 files from 1000 directories
140020509374208 Exception in thread: CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered
Traceback (most recent call last):
File "imagenet.py", line 105, in
num_threads=4, crop=224, device_id=0, num_gpus=1)
File "imagenet.py", line 67, in get_imagenet_iter_dali
dali_iter_train = DALIClassificationIterator(pip_train, size=pip_train.epoch_size("Reader") // world_size)
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 338, in init
last_batch_padded = last_batch_padded)
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 148, in init
self._first_batch = self.next()
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 245, in next
return self.next()
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/plugin/pytorch.py", line 163, in next
outputs.append(p.share_outputs())
File "/usr/local/miniconda3/lib/python3.6/site-packages/nvidia/dali/pipeline.py", line 409, in share_outputs
return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline: Error in thread 0: CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered
Current pipeline object is no longer valid.
terminate called after throwing an instance of 'dali::CUDAError'
what(): CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered
已放弃 (核心已转储)

能帮忙看一下吗?谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant