-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Digits 5.1 DetectNet training error for CUDA 8.0 on Ubuntu 14.04 backed by GTX-1080 #1186
Comments
Please don't tag random developers from the NVIDIA organization, it won't get you an answer any faster. |
Can you please give me the following information:
# for example, this is what comes installed on the nvidia/digits docker image
$ dpkg -l | egrep 'digits|caffe|libcudnn|libnccl|cudart|nvidia'
ii caffe-nv 0.15.13-1+cuda7.5 amd64 Fast open framework for Deep Learning
ii caffe-nv-tools 0.15.13-1+cuda7.5 amd64 Fast open framework for Deep Learning (Tools)
ii cuda-cudart-7-5 7.5-18 amd64 CUDA Runtime native Libraries
ii digits 4.0.0-1 amd64 NVIDIA DIGITS webserver
ii libcaffe-nv0 0.15.13-1+cuda7.5 amd64 Fast open framework for Deep Learning (Libs)
ii libcudnn5 5.1.3-1+cuda7.5 amd64 cuDNN runtime libraries
ii libnccl1 1.2.3-1+cuda7.5 amd64 NVIDIA Collectives Communication Library (NCCL) Runtime
ii python-caffe-nv 0.15.13-1+cuda7.5 amd64 Fast open framework for Deep Learning (Python)
$ nvidia-smi
Thu Oct 20 09:46:46 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M5000 Off | 0000:01:00.0 On | Off |
| 38% 36C P8 17W / 150W | 547MiB / 8120MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:02:00.0 Off | N/A |
| 22% 33C P8 16W / 250W | 116MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1650 G /usr/lib/xorg/Xorg 298MiB |
| 0 2664 G compiz 242MiB |
| 0 4312 G /usr/lib/firefox/firefox 2MiB |
| 1 31049 C /usr/bin/python 112MiB |
+-----------------------------------------------------------------------------+ |
The query actually solved the problem since my setup messed with both CUDA 7.5 and 8.0 installation. Purging the CUDA 8.0 setup solved the problem and the training working like a charm.
Thanks for your helpful concern. :) |
I've solved this problem in another way. #---------------------------------@lukeyeager @xhuvom I have the similar problem when I run Ubuntu 16.04.04 GTX-1080 CUDA 8.0
|
You clearly don't have CUDA 8.0 installed, you have CUDA 7.5 (see Please follow these instructions to install CUDA (here's the download site). You'll probably also need to purge all the |
I have installed Digits (v5.1-dev) (as described) and running a local devserver on Ubuntu 14.04 backed by GTX-1080. The caffe [version 0.15.14]
building works fine with Cuda compilation tools, release 8.0, V8.0.44 with cuDNN (ver. 5.1.5) and the NVIDIA drivers (Driver Version: 367.44) installed properly. But the training attempt on a DetectNet model stops suddenly following error:
My python version is 2.7.6 and caffe version is 0.15.14. The caffe
Makefile.config
is tuned as follows:USE_CUDNN := 1
PYTHON_LIB := /usr/lib
I am back to Ubuntu 14.04 since no rigid official documentation for Ubuntu 16.04 and unavailability of the CUDA 8.0 Pascal support. How could I run a proper training job on DIGITS in my machine? Should I get back the CUDA 8.0RC or anything else?? Requesting some suggestions. Thanks in advance.
The text was updated successfully, but these errors were encountered: