Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Segmentation fault on Ubuntu 14.04 #1290

Closed
meetdave06 opened this issue Mar 29, 2018 · 12 comments
Closed

[GPU] Segmentation fault on Ubuntu 14.04 #1290

meetdave06 opened this issue Mar 29, 2018 · 12 comments

Comments

@meetdave06
Copy link

meetdave06 commented Mar 29, 2018

Environment info

Operating System: Ubuntu 14.04.1
CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
GPU : GeForce GTX 1080, Vendor: NVIDIA Corporation 8GB
Nvidia-version : NVIDIA-SMI 375.39 Driver Version :375.39
C++ : g++ (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
Python : anaconda3 python3.6

Error Message:

[LightGBM] [Info] Number of positive: 220, number of negative: 94780
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 883
[LightGBM] [Info] Number of data: 95000, number of used features: 17
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
Segmentation fault (core dumped)

Reproducible examples

I was not able to update boost via apt-get. It always use to install 1.54 version. So I downloaded 1.64 version from sourceforge and unzipped it to a particular location but didn't build it.

1. cmake -DBOOST_ROOT=/home/dolores/Music/Music/Music/Kaggle/boost_1_64_0/ -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..
-- The C compiler identification is GNU 4.9.4
-- The CXX compiler identification is GNU 4.9.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /usr/local/cuda-8.0/lib64/libOpenCL.so (found version "1.2")
-- OpenCL include directory:/usr/local/cuda-8.0/include
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dolores/Music/Music/Music/Kaggle/LightGBM/build

  1. make -j4
    Scanning dependencies of target _lightgbm
    Scanning dependencies of target lightgbm
    [ 1%] Building CXX object CMakeFiles/_lightgbm.dir/src/c_api.cpp.o
    [ 3%] Building CXX object CMakeFiles/_lightgbm.dir/src/application/application.cpp.o
    [ 5%] Building CXX object CMakeFiles/_lightgbm.dir/src/lightgbm_R.cpp.o
    [ 6%] Building CXX object CMakeFiles/lightgbm.dir/src/main.cpp.o
    [ 8%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
    [ 10%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/boosting.cpp.o
    [ 11%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/boosting.cpp.o
    [ 13%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt.cpp.o
    [ 15%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt.cpp.o
    [ 16%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt_model_text.cpp.o
    [ 18%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt_model_text.cpp.o
    [ 20%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt_prediction.cpp.o
    [ 22%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/prediction_early_stop.cpp.o
    [ 23%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt_prediction.cpp.o
    [ 25%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/prediction_early_stop.cpp.o
    [ 27%] Building CXX object CMakeFiles/lightgbm.dir/src/io/bin.cpp.o
    [ 28%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/bin.cpp.o
    [ 30%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/config.cpp.o
    [ 32%] Building CXX object CMakeFiles/lightgbm.dir/src/io/config.cpp.o
    [ 33%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset.cpp.o
    [ 35%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset.cpp.o
    [ 37%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset_loader.cpp.o
    [ 38%] Building CXX object CMakeFiles/lightgbm.dir/src/io/file_io.cpp.o
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileReader LightGBM::VirtualFileReader::Make(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:159:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileWriter LightGBM::VirtualFileWriter::Make(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:167:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static bool LightGBM::VirtualFileWriter::Exists(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:176:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    At global scope:
    cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
    [ 40%] Building CXX object CMakeFiles/lightgbm.dir/src/io/metadata.cpp.o
    [ 42%] Building CXX object CMakeFiles/lightgbm.dir/src/io/parser.cpp.o
    [ 44%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset_loader.cpp.o
    [ 45%] Building CXX object CMakeFiles/lightgbm.dir/src/io/tree.cpp.o
    [ 47%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/dcg_calculator.cpp.o
    [ 49%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/metric.cpp.o
    [ 50%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/file_io.cpp.o
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileReader LightGBM::VirtualFileReader::Make(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:159:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static std::unique_ptrLightGBM::VirtualFileWriter LightGBM::VirtualFileWriter::Make(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:167:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp: In static member function ‘static bool LightGBM::VirtualFileWriter::Exists(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/io/file_io.cpp:176:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    At global scope:
    cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
    [ 52%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/metadata.cpp.o
    [ 54%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/parser.cpp.o
    [ 55%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/tree.cpp.o
    [ 57%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/dcg_calculator.cpp.o
    [ 59%] Building CXX object CMakeFiles/lightgbm.dir/src/objective/objective_function.cpp.o
    [ 61%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/metric.cpp.o
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&, const LightGBM::ObjectiveConfig&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:47:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    [ 62%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linker_topo.cpp.o
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:84:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    [ 64%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_mpi.cpp.o
    [ 66%] Building CXX object CMakeFiles/_lightgbm.dir/src/objective/objective_function.cpp.o
    [ 67%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_socket.cpp.o
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&, const LightGBM::ObjectiveConfig&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:47:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp: In static member function ‘static LightGBM::ObjectiveFunction* LightGBM::ObjectiveFunction::CreateObjectiveFunction(const string&)’:
    /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/objective/objective_function.cpp:84:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    [ 69%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linker_topo.cpp.o
    At global scope:
    cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
    [ 71%] Building CXX object CMakeFiles/lightgbm.dir/src/network/network.cpp.o
    [ 72%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_mpi.cpp.o
    [ 74%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_socket.cpp.o
    [ 76%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/network.cpp.o
    At global scope:
    cc1plus: warning: unrecognized command line option "-Wno-ignored-attributes"
    [ 77%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
    [ 79%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
    [ 81%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
    [ 83%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
    [ 84%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
    [ 86%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
    [ 88%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
    [ 89%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/tree_learner.cpp.o
    [ 91%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
    [ 93%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
    [ 94%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/tree_learner.cpp.o
    [ 96%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
    [ 98%] Linking CXX executable ../lightgbm
    [ 98%] Built target lightgbm
    [100%] Linking CXX shared library ../lib_lightgbm.so
    [100%] Built target _lightgbm

  2. Then I moved into python-package folder
    sudo /home/dolores/anaconda3/bin/python setup.py install --precompile

running install
running build
running build_py
creating build
creating build/lib
creating build/lib/lightgbm
copying lightgbm/basic.py -> build/lib/lightgbm
copying lightgbm/libpath.py -> build/lib/lightgbm
copying lightgbm/engine.py -> build/lib/lightgbm
copying lightgbm/compat.py -> build/lib/lightgbm
copying lightgbm/init.py -> build/lib/lightgbm
copying lightgbm/plotting.py -> build/lib/lightgbm
copying lightgbm/callback.py -> build/lib/lightgbm
copying lightgbm/sklearn.py -> build/lib/lightgbm
running egg_info
creating lightgbm.egg-info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'build'
warning: no files found matching 'LICENSE'
warning: no files found matching '.txt'
warning: no files found matching '
.so' under directory 'lightgbm'
warning: no files found matching '.txt' under directory 'compile'
warning: no files found matching '
.so' under directory 'compile'
warning: no files found matching '.dll' under directory 'compile/Release'
warning: no files found matching '
' under directory 'compile/compute'
warning: no files found matching '' under directory 'compile/include'
warning: no files found matching '
' under directory 'compile/src'
warning: no files found matching 'LightGBM.sln' under directory 'compile/windows'
warning: no files found matching 'LightGBM.vcxproj' under directory 'compile/windows'
warning: no files found matching 'LightGBM.vcxproj.filters' under directory 'compile/windows'
warning: no files found matching '.dll' under directory 'compile/windows/x64/DLL'
warning: no previously-included files matching '
.py[co]' found anywhere in distribution
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
copying lightgbm/VERSION.txt -> build/lib/lightgbm
running install_lib
copying build/lib/lightgbm/VERSION.txt -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/basic.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/libpath.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/engine.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/compat.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/init.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/plotting.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/callback.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
copying build/lib/lightgbm/sklearn.py -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
INFO:root:Installing lib_lightgbm from: ['../lib_lightgbm.so']
copying ../lib_lightgbm.so -> /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/basic.py to basic.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/libpath.py to libpath.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py to engine.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/compat.py to compat.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/init.py to init.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/plotting.py to plotting.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/callback.py to callback.cpython-36.pyc
byte-compiling /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm/sklearn.py to sklearn.cpython-36.pyc
running install_egg_info
removing '/home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm-2.1.0-py3.6.egg-info' (and everything under it)
Copying lightgbm.egg-info to /home/dolores/anaconda3/lib/python3.6/site-packages/lightgbm-2.1.0-py3.6.egg-info
running install_scripts

  1. After that, I ran my script on a 95000 rows dataset to train the model

[LightGBM] [Info] Number of positive: 220, number of negative: 94780
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 883
[LightGBM] [Info] Number of data: 95000, number of used features: 17
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
Segmentation fault (core dumped)

Steps to reproduce

  1. cmake -DBOOST_ROOT=/home/dolores/Music/Music/Music/Kaggle/boost_1_64_0/ -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda-8.0/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda-8.0/include/ ..
  2. make -j4
  3. cd python-package; sudo /home/dolores/anaconda3/bin/python setup.py install --precompile
  4. Run training
@Laurae2
Copy link
Contributor

Laurae2 commented Mar 29, 2018

Please provide a reproducible example on a smaller publicly available dataset (you also don't have 1 million observations, you have 95000).

We can't nail down whether you don't have enough available GPU RAM or if it is a bug.
If on the smaller dataset it still crashes, can you run gdb as the GPU documentation says to pinpoint where it crashes? (Compile for CLI with debug flag, cf GPU docs)

@meetdave06
Copy link
Author

@Laurae2 Sorry for the mistake. Yes it has 95000 rows. I have edited the question.

I built the LightGBM package with DEBUG flag on. All the other steps were same as above

As per your request, I ran it on a small public dataset (https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/)

Here is the backtrace

GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/dolores/anaconda3/bin/python...done.
(gdb) run
Starting program: /home/dolores/anaconda3/bin/python Kaggle_lightgbm_small.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
['train_sample.csv', 'test_supplement.csv', 'sample_submission.csv', 'test.csv', 'train_reduced.csv', 'train.csv', 'haberman.csv']
[0.02117180824279785] Finished to load data
Starting to train the model
Train and test lgb dataset ready
[New Thread 0x7fffd6b61780 (LWP 28660)]
[New Thread 0x7fffd6760800 (LWP 28661)]
[New Thread 0x7fffd635f880 (LWP 28662)]
[New Thread 0x7fffd5f5e900 (LWP 28663)]
[New Thread 0x7fffd5b5d980 (LWP 28664)]
[New Thread 0x7fffd575ca00 (LWP 28665)]
[New Thread 0x7fffd535ba80 (LWP 28666)]
[New Thread 0x7fffd4f5ab00 (LWP 28667)]
[New Thread 0x7fffd4b59b80 (LWP 28668)]
[LightGBM] [Warning] Only contain one class.
[LightGBM] [Info] Number of positive: 290, number of negative: 0
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 74
[LightGBM] [Info] Number of data: 290, number of used features: 3
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[New Thread 0x7fffa8bbe700 (LWP 28674)]
[New Thread 0x7fffa3fff700 (LWP 28675)]
[New Thread 0x7fffa37fe700 (LWP 28676)]
[New Thread 0x7fffa2ffd700 (LWP 28677)]
[New Thread 0x7fffa27fc700 (LWP 28678)]
[New Thread 0x7fffa1ffb700 (LWP 28679)]
[New Thread 0x7fffa17fa700 (LWP 28680)]
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 64 bins...

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd5b5d980 (LWP 28664)]
0x00007fffd79d74cf in __exchange_and_add (__val=-1, __mem=0x7ffefffffffa) at /usr/include/c++/4.9/ext/atomicity.h:49
49 { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
(gdb) backtrace
#0 0x00007fffd79d74cf in __exchange_and_add (__val=-1, __mem=0x7ffefffffffa) at /usr/include/c++/4.9/ext/atomicity.h:49
#1 __exchange_and_add_dispatch (__val=-1, __mem=0x7ffefffffffa) at /usr/include/c++/4.9/ext/atomicity.h:82
#2 std::string::_Rep::_M_dispose (this=0x7ffeffffffea, __a=...) at /usr/include/c++/4.9/bits/basic_string.h:246
#3 0x00007fffd79e31b6 in _M_dispose (__a=..., this=) at /usr/include/c++/4.9/bits/basic_string.h:240
#4 ~basic_string (this=, __in_chrg=) at /usr/include/c++/4.9/bits/basic_string.h:546
#5 ~path (this=, __in_chrg=) at /home/dolores/Music/Music/Music/Kaggle/boost_1_64_0/boost/filesystem/path.hpp:56
#6 boost::compute::detail::program_binary_path (hash=..., create=create@entry=true) at /home/dolores/Music/Music/Music/Kaggle/LightGBM/compute/include/boost/compute/detail/path.hpp:51
#7 0x00007fffd79e632e in boost::compute::program::save_program_binary (hash=..., prog=...) at /home/dolores/Music/Music/Music/Kaggle/LightGBM/compute/include/boost/compute/program.hpp:646
#8 0x00007fffd79e8bf9 in boost::compute::program::build_with_source (source=..., context=..., options=...) at /home/dolores/Music/Music/Music/Kaggle/LightGBM/compute/include/boost/compute/program.hpp:635
#9 0x00007fffd79dbcad in LightGBM::GPUTreeLearner::BuildGPUKernels () at /home/dolores/Music/Music/Music/Kaggle/LightGBM/src/treelearner/gpu_tree_learner.cpp:609
#10 0x00007ffff203dac3 in __kmp_invoke_microtask () from /home/dolores/anaconda3/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#11 0x00007ffff200c257 in __kmp_invoke_task_func () from /home/dolores/anaconda3/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#12 0x00007ffff200b8d5 in __kmp_launch_thread () from /home/dolores/anaconda3/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#13 0x00007ffff203dfa4 in _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) () from /home/dolores/anaconda3/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#14 0x00007ffff7bc4184 in start_thread (arg=0x7fffd5b5d980) at pthread_create.c:312
#15 0x00007ffff78f103d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@Laurae2
Copy link
Contributor

Laurae2 commented Mar 29, 2018

ping @huanzhang12 , debug trace provided for troubleshooting

@meetdave06
Copy link
Author

@huanzhang12 Hey, any update?

@huanzhang12
Copy link
Contributor

Thank you for reporting this issue! Based on the backtrace, the error is unlikely to be caused by the crash location in boost/compute/program.hpp:646. In fact, the crash is in the destructor of a C++ string. It is very likely that the heap was already corrupted before this point.

@meetdave06 Can you try to use an older commit (for example, compile the source of 2.1.0 release on Jan 25) and see if it works for you? If yes, you can try to do a bisection to find the faulty commit.

@meetdave06
Copy link
Author

@huanzhang12 Sure, I will try that out over the weekend. I am not sure what do you mean by bisection. I guess using git bisect to figure out the faulty commit. I will figure out how to use that.

@StrikerRUS
Copy link
Collaborator

Hi @meetdave06 !
Have you managed to find breaking commit?

@meetdave06
Copy link
Author

@StrikerRUS Sorry, I haven't tried yet.

@huanzhang12
Copy link
Contributor

@meetdave06 I think the problem is Boost compatibility. You mentioned that you downloaded Boost 1.64 but did not build it. Unfortunately, LightGBM depends on two compiled Boost components, libboost-filesystem and libboost-system. These two packages need to be at least 1.58. When you set BOOST_ROOT to 1.64, it will compile with boost 1.64 header files, but during runtime LightGBM still uses the compiled libraries (installed by system) from 1.54, so it crashed.

You need to build boost >= 1.58 from source and make sure libboost_filesystem.so and libboost_system.so used in LightGBM are your compiled version, not the one provided by system. You can verify this using the ldd command.

@StrikerRUS
Copy link
Collaborator

@meetdave06 Please update your boost package and check @huanzhang12's hypothesis.

@StrikerRUS
Copy link
Collaborator

ping @meetdave06

@StrikerRUS
Copy link
Collaborator

Feel free to reopen if boost reinstallation didn't help.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants