Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running problem #28

Open
lapetite123 opened this issue Oct 5, 2019 · 2 comments
Open

running problem #28

lapetite123 opened this issue Oct 5, 2019 · 2 comments

Comments

@lapetite123
Copy link

when I try to run the training code, there are some errors:
Current batch/total batch num: 0/697
2019-10-05 23:19:12.279100: E tensorflow/stream_executor/cuda/cuda_blas.cc:636] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "train.py", line 256, in
train()
File "train.py", line 211, in train
train_one_epoch(sess, ops, train_writer)
File "train.py", line 243, in train_one_epoch
feed_dict=feed_dict)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=786432, n=32, k=9
[[Node: layer1/conv0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](layer1/concat, layer1/conv0/weights/read/_911)]]

Caused by op u'layer1/conv0/Conv2D', defined at:
File "train.py", line 256, in
train()
File "train.py", line 128, in train
pred_sem, pred_ins = get_model(pointclouds_pl, is_training_pl, NUM_CLASSES, bn_decay=bn_decay)
File "/home/ASIS/models/ASIS/model.py", line 29, in get_model
l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=1024, radius=0.1, nsample=32, mlp=[32,32,64], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1')
File "/home/ASIS/utils/pointnet_util.py", line 187, in pointnet_sa_module
data_format=data_format)
File "/home/ASIS/utils/tf_util.py", line 165, in conv2d
data_format=data_format)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 639, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/home/anaconda3/envs/asis/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): Blas SGEMM launch failed : m=786432, n=32, k=9
[[Node: layer1/conv0/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](layer1/concat, layer1/conv0/weights/read/_911)]]

I don't know how to solve it, I really hope that you can help me! waiting for your reply,thanks!

@WXinlong
Copy link
Owner

Hi @lapetite123 , is this issue solved. It looks like your environment was not well prepared.

@XIAOGUOY
Copy link

I also encountered the same problem, please ask you to solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants