-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #4
Comments
It looks like an incompatibility issue with CUDA, cuDNN and pytorch. Are you able to successfully run other files, like |
When I run Traceback (most recent call last): |
I change the environment with CUDA11.1,cuDNN8.0.4,pytorch1.8.0,it solves the previous problem but a new one emerged. Traceback (most recent call last): |
Did you run |
When I run [Parallel(n_jobs=-1)]: Done 126816 tasks | elapsed: 7.0min So I didn't get all point clouds,I think perhaps it cause the last problem. |
This segmentation fault is caused by OpenCascade. Some cad models can not be converted to point clouds successfully. You can find those problematic data by replacing the Parallel execution with a for loop and printing out each data_id to see which one caused the problem. Then just skip it in the next run. Another quick solution is to simply give up those unprocessed data (if not too many) and to replace the following line in Line 182 in 1ff0ab1
return self.__getitem__(random.randint(0, self.__len__())) |
Thank you for your advice.I wrote a for loop and found the problematic data_id and I successfully run |
thank you for sharing
pc2cad.py
,when I run the code:python pc2cad.py --exp_name pretrained --ae_ckpt 1000 -g 0 --pc_root /public1/tz/DeepCAD/data/pc_cad
,I got the error:
Traceback (most recent call last):
File "pc2cad.py", line 246, in
outputs, losses = agent.train_func(data)
File "/public1/tz/DeepCAD/trainer/base.py", line 118, in train_func
outputs, losses = self.forward(data)
File "pc2cad.py", line 159, in forward
pred_code = self.net(points)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "pc2cad.py", line 138, in forward
xyz, features = module(xyz, features)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/pointnet2_ops/pointnet2_modules.py", line 66, in forward
new_features = self.mlpsi # (B, mlp[-1], npoint, nsample)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 106, in forward
exponential_average_factor, self.eps)
File "/home/server/anaconda3/envs/DeepCAD/lib/python3.7/site-packages/torch/nn/functional.py", line 1923, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Package Version Location
absl-py 1.0.0
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
cycler 0.11.0
Cython 0.29.13
future 0.18.2
google-auth 2.3.3
google-auth-oauthlib 0.4.6
grpcio 1.41.1
h5py 2.10.0
hydra-core 0.11.3
idna 3.3
importlib-metadata 4.8.2
joblib 0.14.1
kiwisolver 1.3.2
lmdb 1.2.1
loguru 0.5.3
Markdown 3.3.4
matplotlib 3.1.3
msgpack 1.0.2
msgpack-numpy 0.4.7.1
numpy 1.18.1
oauthlib 3.1.1
omegaconf 1.4.1
Pillow 8.3.2
pip 21.0.1
plyfile 0.7.2
pointnet2 3.0.0 /public1/tz/Pointnet2_PyTorch-master
pointnet2-ops 3.0.0
protobuf 3.19.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyparsing 3.0.6
python-dateutil 2.8.2
pytorch-lightning 0.7.1
PyYAML 6.0
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
scikit-learn 0.24.2
scipy 1.4.1
setuptools 58.0.4
six 1.16.0
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorboardX 2.0
threadpoolctl 3.0.0
torch 1.5.1
torchvision 0.6.1
tqdm 4.42.1
trimesh 3.2.19
typing-extensions 4.0.0
urllib3 1.26.7
vtk 9.0.1
Werkzeug 2.0.2
wheel 0.37.0
zipp 3.6.0
I have two RTX3090,CUDA10.2,CuDNN7.6.5,Pytorch1.5.1,Python3.7
The text was updated successfully, but these errors were encountered: