-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Segmentation fault on Ubuntu 14.04 #1290
Comments
Please provide a reproducible example on a smaller publicly available dataset (you also don't have 1 million observations, you have 95000). We can't nail down whether you don't have enough available GPU RAM or if it is a bug. |
@Laurae2 Sorry for the mistake. Yes it has 95000 rows. I have edited the question. I built the LightGBM package with DEBUG flag on. All the other steps were same as above As per your request, I ran it on a small public dataset (https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/) Here is the backtrace
|
ping @huanzhang12 , debug trace provided for troubleshooting |
@huanzhang12 Hey, any update? |
Thank you for reporting this issue! Based on the backtrace, the error is unlikely to be caused by the crash location in @meetdave06 Can you try to use an older commit (for example, compile the source of 2.1.0 release on Jan 25) and see if it works for you? If yes, you can try to do a bisection to find the faulty commit. |
@huanzhang12 Sure, I will try that out over the weekend. I am not sure what do you mean by bisection. I guess using git bisect to figure out the faulty commit. I will figure out how to use that. |
Hi @meetdave06 ! |
@StrikerRUS Sorry, I haven't tried yet. |
@meetdave06 I think the problem is Boost compatibility. You mentioned that you downloaded Boost 1.64 but did not build it. Unfortunately, LightGBM depends on two compiled Boost components, libboost-filesystem and libboost-system. These two packages need to be at least 1.58. When you set You need to build boost >= 1.58 from source and make sure |
@meetdave06 Please update your boost package and check @huanzhang12's hypothesis. |
ping @meetdave06 |
Feel free to reopen if boost reinstallation didn't help. |
Environment info
Operating System: Ubuntu 14.04.1
CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
GPU : GeForce GTX 1080, Vendor: NVIDIA Corporation 8GB
Nvidia-version : NVIDIA-SMI 375.39 Driver Version :375.39
C++ : g++ (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
Python : anaconda3 python3.6
Error Message:
[LightGBM] [Info] Number of positive: 220, number of negative: 94780
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 883
[LightGBM] [Info] Number of data: 95000, number of used features: 17
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
Segmentation fault (core dumped)
Reproducible examples
I was not able to update boost via apt-get. It always use to install 1.54 version. So I downloaded 1.64 version from sourceforge and unzipped it to a particular location but didn't build it.
1.
cmake -DBOOST_ROOT=/home/dolores/Music/Music/Music/Kaggle/boost_1_64_0/ -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..
-- The C compiler identification is GNU 4.9.4
-- The CXX compiler identification is GNU 4.9.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /usr/local/cuda-8.0/lib64/libOpenCL.so (found version "1.2")
-- OpenCL include directory:/usr/local/cuda-8.0/include
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dolores/Music/Music/Music/Kaggle/LightGBM/build
make -j4
Scanning dependencies of target _lightgbm
Scanning dependencies of target lightgbm
[ 1%] Building CXX object CMakeFiles/_lightgbm.dir/src/c_api.cpp.o
[ 3%] Building CXX object CMakeFiles/_lightgbm.dir/src/application/application.cpp.o
[ 5%] Building CXX object CMakeFiles/_lightgbm.dir/src/lightgbm_R.cpp.o
[ 6%] Building CXX object CMakeFiles/lightgbm.dir/src/main.cpp.o
[ 8%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
[ 10%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/boosting.cpp.o
[ 11%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/boosting.cpp.o
[ 13%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt.cpp.o
[ 15%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt.cpp.o
[ 16%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt_model_text.cpp.o
[ 18%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt_model_text.cpp.o
[ 20%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt_prediction.cpp.o
[ 22%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/prediction_early_stop.cpp.o
[ 23%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt_prediction.cpp.o
[ 25%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/prediction_early_stop.cpp.o
[ 27%] Building CXX object CMakeFiles/lightgbm.dir/src/io/bin.cpp.o
[ 28%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/bin.cpp.o
[ 30%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/config.cpp.o
[ 32%] Building CXX object CMakeFiles/lightgbm.dir/src/io/config.cpp.o
[ 33%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset.cpp.o
[ 35%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset.cpp.o
[ 37%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset_loader.cpp.o
[ 38%] Building CXX object CMakeFiles/lightgbm.dir/src/io/file_io.cpp.o
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileReader LightGBM::VirtualFileReader::Make(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:159:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileWriter LightGBM::VirtualFileWriter::Make(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:167:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static bool LightGBM::VirtualFileWriter::Exists(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:176:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
[ 40%] Building CXX object CMakeFiles/lightgbm.dir/src/io/metadata.cpp.o
[ 42%] Building CXX object CMakeFiles/lightgbm.dir/src/io/parser.cpp.o
[ 44%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset_loader.cpp.o
[ 45%] Building CXX object CMakeFiles/lightgbm.dir/src/io/tree.cpp.o
[ 47%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/dcg_calculator.cpp.o
[ 49%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/metric.cpp.o
[ 50%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/file_io.cpp.o
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileReader LightGBM::VirtualFileReader::Make(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:159:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileWriter LightGBM::VirtualFileWriter::Make(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:167:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static bool LightGBM::VirtualFileWriter::Exists(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:176:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
[ 52%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/metadata.cpp.o
[ 54%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/parser.cpp.o
[ 55%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/tree.cpp.o
[ 57%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/dcg_calculator.cpp.o
[ 59%] Building CXX object CMakeFiles/lightgbm.dir/src/objective/objective_function.cpp.o
[ 61%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/metric.cpp.o
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&, const LightGBM::ObjectiveConfig&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:47:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
[ 62%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linker_topo.cpp.o
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:84:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
[ 64%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_mpi.cpp.o
[ 66%] Building CXX object CMakeFiles/_lightgbm.dir/src/objective/objective_function.cpp.o
[ 67%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_socket.cpp.o
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&, const LightGBM::ObjectiveConfig&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:47:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&)’:
/home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:84:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
[ 69%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linker_topo.cpp.o
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
[ 71%] Building CXX object CMakeFiles/lightgbm.dir/src/network/network.cpp.o
[ 72%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_mpi.cpp.o
[ 74%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_socket.cpp.o
[ 76%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/network.cpp.o
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
[ 77%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 79%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 81%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 83%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 84%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 86%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 88%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
[ 89%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/tree_learner.cpp.o
[ 91%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 93%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
[ 94%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/tree_learner.cpp.o
[ 96%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 98%] Linking CXX executable ../lightgbm
[ 98%] Built target lightgbm
[100%] Linking CXX shared library ../lib_lightgbm.so
[100%] Built target _lightgbm
Then I moved into python-package folder
sudo /home/dolores/anaconda3/bin/python setup.py install --precompile
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/lightgbm
copying lightgbm/basic.py -> build/lib/lightgbm
copying lightgbm/libpath.py -> build/lib/lightgbm
copying lightgbm/engine.py -> build/lib/lightgbm
copying lightgbm/compat.py -> build/lib/lightgbm
copying lightgbm/init.py -> build/lib/lightgbm
copying lightgbm/plotting.py -> build/lib/lightgbm
copying lightgbm/callback.py -> build/lib/lightgbm
copying lightgbm/sklearn.py -> build/lib/lightgbm
running egg_info
creating lightgbm.egg-info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'build'
warning: no files found matching 'LICENSE'
warning: no files found matching '.txt'
warning: no files found matching '.so' under directory 'lightgbm'
warning: no files found matching '.txt' under directory 'compile'
warning: no files found matching '.so' under directory 'compile'
warning: no files found matching '.dll' under directory 'compile/Release'
warning: no files found matching '' under directory 'compile/compute'
warning: no files found matching '' under directory 'compile/include'
warning: no files found matching '' under directory 'compile/src'
warning: no files found matching 'LightGBM.sln' under directory 'compile/windows'
warning: no files found matching 'LightGBM.vcxproj' under directory 'compile/windows'
warning: no files found matching 'LightGBM.vcxproj.filters' under directory 'compile/windows'
warning: no files found matching '.dll' under directory 'compile/windows/x64/DLL'
warning: no previously-included files matching '.py[co]' found anywhere in distribution
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
copying lightgbm/VERSION.txt -> build/lib/lightgbm
running install_lib
copying build/lib/lightgbm/VERSION.txt -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/basic.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/libpath.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/engine.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/compat.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/init.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/plotting.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/callback.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/sklearn.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
INFO:root:Installing lib_lightgbm from: ['../lib_lightgbm.so']
copying ../lib_lightgbm.so -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/basic.py to basic.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/libpath.py to libpath.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py to engine.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/compat.py to compat.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/init.py to init.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/plotting.py to plotting.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/callback.py to callback.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/sklearn.py to sklearn.cpython-36.pyc
running install_egg_info
removing '/home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm-2.1.0-py3.6.egg-info' (and everything under it)
Copying lightgbm.egg-info to /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm-2.1.0-py3.6.egg-info
running install_scripts
After that, I ran my script on a 95000 rows dataset to train the model
[LightGBM] [Info] Number of positive: 220, number of negative: 94780
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 883
[LightGBM] [Info] Number of data: 95000, number of used features: 17
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
Segmentation fault (core dumped)
Steps to reproduce
cmake -DBOOST_ROOT=/home/dolores/Music/Music/Music/Kaggle/boost_1_64_0/ -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..
make -j4
cd python-package; sudo /home/dolores/anaconda3/bin/python setup.py install --precompile
The text was updated successfully, but these errors were encountered: