You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great paper and idea! Here is a question about the runtime error when I train the model using Choi as the model type. The task I am using is SNLI. Here is the command that I used for the training. CUDA_VISIBLE_DEVICES=0,1 python3.6 train.py --data-type snli --glove-path ./pickle/pickle_glove --data-path ./cache/snli_cache --model-type Choi --word-dim 300 --hidden-dim 300 --clf-hidden-dim 1024 --clf-num-layers 1 --dropout 0.1 --batch-size 128 --max-epoch 10 --lr 0.001 --l2reg 1e-5 --optimizer adam --patience 10 --clip 5 --sample-num 2 --use-batchnorm --rank-input w --fix-word-embedding --leaf-rnn-type lstm --cuda --save-dir ./saved_model.
However, I got a runtime error and it says RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:249.
I am using two 32GB V100 GPUs to train the model. Do you have any suggestions to resolve that issue? Many thanks!
The text was updated successfully, but these errors were encountered:
Hi,
Great paper and idea! Here is a question about the runtime error when I train the model using
Choi
as the model type. The task I am using isSNLI
. Here is the command that I used for the training.CUDA_VISIBLE_DEVICES=0,1 python3.6 train.py --data-type snli --glove-path ./pickle/pickle_glove --data-path ./cache/snli_cache --model-type Choi --word-dim 300 --hidden-dim 300 --clf-hidden-dim 1024 --clf-num-layers 1 --dropout 0.1 --batch-size 128 --max-epoch 10 --lr 0.001 --l2reg 1e-5 --optimizer adam --patience 10 --clip 5 --sample-num 2 --use-batchnorm --rank-input w --fix-word-embedding --leaf-rnn-type lstm --cuda --save-dir ./saved_model
.However, I got a runtime error and it says
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:249
.I am using two 32GB V100 GPUs to train the model. Do you have any suggestions to resolve that issue? Many thanks!
The text was updated successfully, but these errors were encountered: