Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUBLAS_STATUS_EXECUTION_FAILED #6

Open
Lucien7786 opened this issue Apr 21, 2017 · 0 comments
Open

CUBLAS_STATUS_EXECUTION_FAILED #6

Lucien7786 opened this issue Apr 21, 2017 · 0 comments

Comments

@Lucien7786
Copy link

when I tried the steps follow your instructions at this point:
" # Train the SSD-ResNet-101 321x321
python examples/ssd/ssd_pascal_resnet_321.py "

Error showing:

I0420 22:48:18.898177 7233 sgd_solver.cpp:138] Iteration 2520, lr = 0.001
F0421 09:02:45.161278 7233 math_functions.cu:52] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
*** Check failure stack trace: ***
@ 0x7f2f6e971daa (unknown)
@ 0x7f2f6e971ce4 (unknown)
@ 0x7f2f6e9716e6 (unknown)
@ 0x7f2f6e974687 (unknown)
@ 0x7f2f6f1d36a5 caffe::caffe_gpu_gemv<>()
@ 0x7f2f6f17a51a caffe::BiasLayer<>::Backward_gpu()
@ 0x7f2f6f193a47 caffe::ScaleLayer<>::Backward_gpu()
@ 0x7f2f6f15c817 caffe::Net<>::BackwardFromTo()
@ 0x7f2f6f15c981 caffe::Net<>::Backward()
@ 0x7f2f6f0b4c8b caffe::Solver<>::Step()
@ 0x7f2f6f0b538e caffe::Solver<>::Solve()
@ 0x40b568 train()
@ 0x40899c main
@ 0x7f2f6d0f1f45 (unknown)
@ 0x4092a3 (unknown)
@ (nil) (unknown)
Aborted (core dumped)

I had tried several times, but it still came out a seems random but the same core dump error at different Iteration counts(the last iteration breakpoint is "Iteration 9920"). And the current temp caffemodel is not be auto-saved when failed. The iteration is begin from 0 again. Time wasted!
I found the error code on cuda toolkit docment. It seems to be cuBLAS library or Driver issue.
2.2.2. cublasStatus_t CUBLAS_STATUS_EXECUTION_FAILED
The weird thing is that I had trained a ssd caffemodel(weiliu89's version, voc0712, vgg16, iter=12000) successfully on this computer several days before. Waiting for your reply.
I am using tx1, the cuda version is the latest version from nvidia, cuda_8.0.61_375.26_linux.run

Here is my computer information:

Fri Apr 21 10:49:22 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A |
| 23% 33C P8 15W / 250W | 186MiB / 12188MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant