Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem installing caffe for digits #46

Closed
Tutufa opened this issue Apr 5, 2015 · 23 comments
Closed

Problem installing caffe for digits #46

Tutufa opened this issue Apr 5, 2015 · 23 comments

Comments

@Tutufa
Copy link

Tutufa commented Apr 5, 2015

Hi, Im trying to install nvidia's version of caffe, with steps provided here, and everything works, until I run: make runtest, it outputs this:

.build_release/tools/caffe
dyld: Library not loaded: libcaffe-nv.so.0
Referenced from: /Users/caffeteria/caffe/.build_release/tools/caffe
Reason: image not found
make: *** [runtest] Trace/BPT trap: 5

can anyone help how to solve this?

OS 10.10.2, I mostly used brew and pip

@Tutufa
Copy link
Author

Tutufa commented Apr 5, 2015

vanilla version of caffe compiled just fine

@cicero19
Copy link

cicero19 commented Apr 6, 2015

Having trouble with the make all command. Any ideas?

~/caffe$ make all
LD -o .build_release/lib/libcaffe-nv.so.0.10.0
/usr/bin/ld: cannot find -lcudnn
collect2: error: ld returned 1 exit status
make: *** [.build_release/lib/libcaffe-nv.so.0.10.0] Error 1

@lukeyeager
Copy link
Member

@Tutufa, Mac is not technically a supported OS, as seen here. However, Rudolph seems to have had some luck with it (see #32 and #33). In your case, it looks like the shared objects aren't being created correctly, as the libraries should end with .dylib on a Mac instead of .so. @RudolphV, did you run into this issue?

@cicero19, your problem is unrelated. It seems like you need to either install cuDNN or turn off the USE_CUDNN flag in your Makefile.config.

@RudolphV
Copy link

RudolphV commented Apr 6, 2015

The only two issues I ran into was #32 and #33 and the workarounds discussed in those threads fixed the problems.

Caffe (ver 0.10.0) installed and tested fine with cuDNN. I followed the installation instructions from here.

I'm running the developer seed of Mac OS X 10.10.3 and the latest corresponding drivers from NVIDIA (NVIDIA Web Driver: 343.02.03b04, CUDA 7.0.35 (CUDA Driver Version: 7.0.29, GPU Driver Version 10.3.5 (343.02.03b4)).

@cicero19
Copy link

cicero19 commented Apr 6, 2015

@lukeyeager thanks, I figured out that issue. I was able to start digits but it was unable to find my GPU:

Something went wrong when loading libcudart.so
No GPUs found. Assuming CPU-only mode.

How do I fix this libcudart.so error?

When I do make runtest on caffe it tells me: CUDA driver version is insufficient for CUDA runtime version. In my usr/local folder I have one folder named cuda and another named cuda-7.0. Not sure which one it is using.

@lukeyeager
Copy link
Member

CUDA driver version is insufficient for CUDA runtime version

It sounds like you need to update your driver.

If that doesn't work for you, make sure you have the latest changes for DIGITS that include the bugfix for #33.

@cicero19
Copy link

cicero19 commented Apr 6, 2015

I updated the driver with
$ sudo apt-get install nvidia-current

Now Digits doesn't start. I checked the file from #33 but I am not on a Mac so not sure how it is applicable.


| ()/ __()_ / _|
| |) | | (
| | | | __
|**
/||**|| || |___/

OSError: libcudart.so: cannot open shared object file: No such file or directory
Try setting your LD_LIBRARY_PATH
Traceback (most recent call last):
File "digits-devserver", line 40, in
from digits.webapp import app, socketio, scheduler
File "/home/ciceromar/digits/digits/webapp.py", line 11, in
import digits.scheduler
File "/home/ciceromar/digits/digits/scheduler.py", line 18, in
from model import ModelJob, tasks as model_tasks
File "/home/ciceromar/digits/digits/model/init.py", line 3, in
from job import ModelJob
File "/home/ciceromar/digits/digits/model/job.py", line 4, in
from . import tasks
File "/home/ciceromar/digits/digits/model/tasks/init.py", line 4, in
from caffe_train import CaffeTrainTask
File "/home/ciceromar/digits/digits/model/tasks/caffe_train.py", line 5, in
import caffe
File "/home/ciceromar/caffe/python/caffe/init.py", line 1, in
from .pycaffe import Net, SGDSolver
File "/home/ciceromar/caffe/python/caffe/pycaffe.py", line 13, in
from ._caffe import Net, SGDSolver
ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

@lukeyeager
Copy link
Member

I am not on a Mac so not sure how it is applicable

Oh sorry. You really should have put this in a separate issue, I got your issue mixed up with Tutufa's.

As it says in the output you pasted above:

Try setting your LD_LIBRARY_PATH

You're looking for something like this:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH`

@lukeyeager
Copy link
Member

It sounds like @cicero19 solved his problem. @Tutufa, were you able to get DIGITS running?

@Tutufa
Copy link
Author

Tutufa commented Apr 14, 2015

@lukeyeager No, I use vanilla caffe currently, runs fine. But nvidia brunch gives same mistake, when I try to do make runtest:

Kovalenkos-MacBook-Pro:caffe_nvidia kovalenko$ make runtest
.build_release/tools/caffe
dyld: Library not loaded: libcaffe-nv.so.0
Referenced from: /Users/kovalenko/caffe_nvidia/.build_release/tools/caffe
Reason: image not found
make: *** [runtest] Trace/BPT trap: 5

Quite sad, digits looks very sexy on webinar I watched few days ago.

@lukeyeager
Copy link
Member

Quite sad

Indeed. @moconnor725, can you help figure out how to fix this on OS X? A hacky solution is fine for now until we fix up the cmake build.

@cicero19
Copy link

Yes, I was able to resolve my issue with a graphics driver update and updated CUDA driver. Thanks for your fast and helpful support. Keep up the good work!

@Tutufa
Copy link
Author

Tutufa commented Apr 24, 2015

I migrated to AWS + ubuntu, digits runs fine so far. Thanks to all, hope os x version will arrive soon (-:

@qwersdf
Copy link

qwersdf commented May 21, 2015

I finished all the install steps. However, the results is totally failed.

  ___  _  ___ _ _____ ___
 |   \(_)/ __(_)_   _/ __|
 | |) | | (_ | | | | \__ \
 |___/|_|\___|_| |_| |___/

Welcome to the DiGiTS config module.

Where is caffe installed? (enter "SYS" if installed system-wide)
    [default is /home/yingyu/Downloads/digits-1.0/caffe]
(q to quit) >>> 

Where would you like to store jobs?
    [default is /home/yingyu/.digits/jobs]
(q to quit) >>> 

What is the minimum log level that you want to save to your logfile? [error/warning/info/debug]
    [default is info]
(q to quit) >>> 

New config:
            gpu_list - 
          secret_key - 27cbcd5f161ce2d2067db358
           log_level - info
            jobs_dir - /home/yingyu/.digits/jobs
          caffe_root - /home/yingyu/Downloads/digits-1.0/caffe

Traceback (most recent call last):
  File "digits/digits-devserver", line 40, in <module>
    from digits.webapp import app, socketio, scheduler
  File "/home/yingyu/Downloads/digits-1.0/digits/digits/webapp.py", line 11, in <module>
    import digits.scheduler
  File "/home/yingyu/Downloads/digits-1.0/digits/digits/scheduler.py", line 18, in <module>
    from model import ModelJob, tasks as model_tasks
  File "/home/yingyu/Downloads/digits-1.0/digits/digits/model/__init__.py", line 3, in <module>
    from job import ModelJob
  File "/home/yingyu/Downloads/digits-1.0/digits/digits/model/job.py", line 4, in <module>
    from . import tasks
  File "/home/yingyu/Downloads/digits-1.0/digits/digits/model/tasks/__init__.py", line 4, in <module>
    from caffe_train import CaffeTrainTask
  File "/home/yingyu/Downloads/digits-1.0/digits/digits/model/tasks/caffe_train.py", line 5, in <module>
    import caffe
  File "/home/yingyu/Downloads/digits-1.0/caffe/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver
  File "/home/yingyu/Downloads/digits-1.0/caffe/python/caffe/pycaffe.py", line 13, in <module>
    from ._caffe import Net, SGDSolver
ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

@lukeyeager
Copy link
Member

libcudart.so.7.0: cannot open shared object file: No such file or directory

See #54 (comment), #8 (comment) or even #46 (comment) (above).

I've updated the install instructions now to include this step.

@man-sean
Copy link

Hi, Im tryning to compile Nvidia-Caffe to run DIGITS on Mac (OSX 10.10.3)
I'm experiencing the same issue as @Tutufa :

Seans-MacBook-Pro:caffe sean$ make runtest

.build_release/tools/caffe
dyld: Library not loaded: libcaffe-nv.so.0
Referenced from: /Users/sean/caffe/.build_release/tools/caffe
Reason: image not found
make: *** [runtest] Trace/BPT trap: 5

@lukeyeager is there any known solutions yet?

P.S. I'm not familiar with Caffe or DIGITS, plain noob

@lukeyeager
Copy link
Member

Can you try the CMake build instead of the Make build? I've been meaning to get around to investigating this issue, but I've been busy with other things.

@man-sean
Copy link

@lukeyeager I tried with CMake - worked! thanks.
but something weird, I wanted to test that Caffe works so I ran the mnist example,
but 'build' alias to '.build_release' and '.build_release' were not exist.
I simply copied the folder from 'vanilla' installation of Caffe,
Is it a know issue? did I skip something?

@man-sean
Copy link

@lukeyeager okay, I got another problem:
I tried to 'make pycaffe' and got the following error:

-- Boost version: 1.57.0
-- Found the following Boost libraries:
--   system
--   thread
-- Found glog    (include: /usr/local/include, library: /usr/local/lib/libglog.dylib)
-- Found gflags  (include: /usr/local/include, library: /usr/local/lib/libgflags.dylib)
-- Found PROTOBUF Compiler: /usr/local/bin/protoc
-- Found lmdb    (include: /usr/local/include, library: /usr/local/lib/liblmdb.dylib)
-- Found LevelDB (include: /usr/local/include, library: /usr/local/lib/libleveldb.dylib)
-- Found Snappy  (include: /usr/local/include, library: /usr/local/lib/libsnappy.dylib)
-- CUDA detected: 7.0
-- Found cuDNN (include: /Users/sean/Documents/cuDNN, library: /Users/sean/Documents/cuDNN/libcudnn.6.5.dylib)
-- Added CUDA NVCC flags for: sm_30
-- OpenCV found (/usr/local/share/OpenCV)
-- Found vecLib as part of Accelerate.framework
-- NumPy ver. 1.9.2 found (include: /usr/local/lib/python2.7/site-packages/numpy/core/include)
-- Boost version: 1.57.0
-- Found the following Boost libraries:
-- python
-- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE) 
-- 
-- ******************* Caffe Configuration Summary *******************
-- General:
--   Version           :   <TODO> (Caffe doesn't declare its version in headers)
--   Git               :   v0.12.1-dirty
--   System            :   Darwin
--   C++ compiler      :   /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
--   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
--   Debug CXX flags   :   -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
--   Build type        :   Release
-- 
--   BUILD_SHARED_LIBS :   ON
--   BUILD_python      :   ON
--   BUILD_matlab      :   OFF
--   BUILD_docs        :   ON
--   CPU_ONLY          :   OFF
-- 
-- Dependencies:
--   BLAS              :   Yes (vecLib)
--   Boost             :   Yes (ver. 1.57)
--   glog              :   Yes
--   gflags            :   Yes
--   protobuf          :   Yes (ver. 2.6.1)
--   lmdb              :   Yes (ver. 0.9.14)
--   Snappy            :   Yes (ver. 1.1.3)
--   LevelDB           :   Yes (ver. 1.18)
--   OpenCV            :   Yes (ver. 3.0.0)
--   CUDA              :   Yes (ver. 7.0)
-- 
-- NVIDIA CUDA:
--   Target GPU(s)     :   Auto
--   GPU arch(s)       :   sm_30
--   cuDNN             :   Yes
-- 
-- Python:
--   Interpreter       :   /usr/local/Cellar/python/2.7.10_2/bin/python (ver. 2.7.10)
--   Libraries         :   /usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib (ver 2.7.10)
--   NumPy             :   /usr/local/lib/python2.7/site-packages/numpy/core/include (ver 1.9.2)
-- 
-- Documentaion:
--   Doxygen           :   No
--   config_file       :   
-- 
-- Install:
--   Install path      :   /Users/sean/caffe/install
-- 
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/sean/caffe
[  1%] Built target proto
Linking CXX shared library ../../lib/libcaffe-nv.dylib
Undefined symbols for architecture x86_64:
  "caffe::SolverState::SolverState()", referenced from:
      caffe::Solver<float>::Restore(char const*) in solver.cpp.o
      caffe::Solver<float>::Snapshot() in solver.cpp.o
      caffe::Solver<double>::Restore(char const*) in solver.cpp.o
      caffe::Solver<double>::Snapshot() in solver.cpp.o
  "caffe::SolverState::~SolverState()", referenced from:
      caffe::Solver<float>::Restore(char const*) in solver.cpp.o
      caffe::Solver<float>::Snapshot() in solver.cpp.o
      caffe::Solver<double>::Restore(char const*) in solver.cpp.o
      caffe::Solver<double>::Snapshot() in solver.cpp.o
     [...]
  "caffe::ParamSpec::~ParamSpec()", referenced from:
      caffe::Net<float>::Init(caffe::NetParameter const&) in net.cpp.o
      caffe::Net<float>::GetLearningRateAndWeightDecay() in net.cpp.o
      caffe::Net<double>::Init(caffe::NetParameter const&) in net.cpp.o
      caffe::Net<double>::GetLearningRateAndWeightDecay() in net.cpp.o
  "typeinfo for caffe::LayerParameter", referenced from:
      ___cxx_global_var_init30 in layer_factory.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [lib/libcaffe-nv.0.12.1.dylib] Error 1
make[2]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
make[1]: *** [python/CMakeFiles/pycaffe.dir/rule] Error 2
make: *** [python/CMakeFiles/pycaffe.dir/rule] Error 2

edit: actually, this error look very similar to the original error of @Tutufa and mine (that the cake installation solved for 'make all')

@lukeyeager
Copy link
Member

I wanted to test that Caffe works so I ran the mnist example, but 'build' alias to '.build_release' and '.build_release' were not exist.

The correct way to use CMake is:

mkdir build && cd build
cmake ..
make

@man-sean
Copy link

@lukeyeager Thanks, I'm using the GUI version of CMake, so I guessed that 'cmake ..' is replaced by clicking 'Generate' while 'Where to build binaries' is pointing to 'caffe/build', am I right?
If so, 'make py' print the following error:

make: *** No rule to make target `py'. Stop.

the parameters I used in CMake:

Boost version: 1.57.0
Found the following Boost libraries:
system
thread
Found glog (include: /usr/local/include, library: /usr/local/lib/libglog.dylib)
Found gflags (include: /usr/local/include, library: /usr/local/lib/libgflags.dylib)
Found PROTOBUF Compiler: /usr/local/bin/protoc
Found lmdb (include: /usr/local/include, library: /usr/local/lib/liblmdb.dylib)
Found LevelDB (include: /usr/local/include, library: /usr/local/lib/libleveldb.dylib)
Found Snappy (include: /usr/local/include, library: /usr/local/lib/libsnappy.dylib)
CUDA detected: 7.0
Found cuDNN (include: /Users/sean/Documents/cuDNN, library: /Users/sean/Documents/cuDNN/libcudnn.6.5.dylib)
Added CUDA NVCC flags for: sm_30
OpenCV found (/usr/local/share/OpenCV)
Found vecLib as part of Accelerate.framework
NumPy ver. 1.9.2 found (include: /usr/local/lib/python2.7/site-packages/numpy/core/include)
Boost version: 1.57.0
Found the following Boost libraries:
python
Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)

******************* Caffe Configuration Summary *******************
General:
Version : (Caffe doesn't declare its version in headers)
Git : v0.12.1-dirty
System : Darwin
C++ compiler : /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
Debug CXX flags : -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
Build type : Release

BUILD_SHARED_LIBS : ON
BUILD_python : ON
BUILD_matlab : OFF
BUILD_docs : ON
CPU_ONLY : OFF

Dependencies:
BLAS : Yes (vecLib)
Boost : Yes (ver. 1.57)
glog : Yes
gflags : Yes
protobuf : Yes (ver. 2.6.1)
lmdb : Yes (ver. 0.9.14)
Snappy : Yes (ver. 1.1.3)
LevelDB : Yes (ver. 1.18)
OpenCV : Yes (ver. 3.0.0)
CUDA : Yes (ver. 7.0)

NVIDIA CUDA:
Target GPU(s) : Auto
GPU arch(s) : sm_30
cuDNN : Yes

Python:
Interpreter : /usr/local/Cellar/python/2.7.10_2/bin/python (ver. 2.7.10)
Libraries : /usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib (ver 2.7.10)
NumPy : /usr/local/lib/python2.7/site-packages/numpy/core/include (ver 1.9.2)

Documentaion:
Doxygen : No
config_file :

Install:
Install path : /Users/sean/caffe/build/install

@lukeyeager
Copy link
Member

I'm using the GUI version of CMake, so I guessed that 'cmake ..' is replaced by clicking 'Generate' while 'Where to build binaries' is pointing to 'caffe/build', am I right?

Sounds right to me.

If so, 'make py' print the following error:

Whoops, nevermind on the "make py". I've updated my instructions above. For me, when I run make, it builds pycaffe for me automatically:

Linking CXX executable caffe-nv
[100%] Built target extract_features
Linking CXX executable caffe
[100%] Built target caffe.bin
Linking CXX shared library ../lib/caffe-nv.so
Creating symlink /home/lyeager/caffe/python/caffe/_caffe.so -> /home/lyeager/caffe/build/lib/_caffe.so
[100%] Built target pycaffe

Does it not build for you? Check and see if python/caffe/_caffe.so exists.

@man-sean
Copy link

@lukeyeager it does, <python/caffe/_caffe.so> exist and I can < import caffe > through ipython (if PYTHONPATH set correctly). so it seems it all good, I can even fire up DIGITS with < ./digits-devserver --config > ! thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants