You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue:
Whenever I run this code, the dask job crashes and all the workers get lost and then the task just hangs forever. While if I provide small size files then the same code works fine. (<100MB). I'm not sure what the issue is. Pasting the error below in "Anything else we need to know section"
Anything else we need to know?:
here is the error log that I see:
[LightGBM] [Debug] Dataset::GetMultiBinFromSparseFeatures: sparse rate 0.934990
[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.372672
[LightGBM] [Debug] init for col-wise cost 0.708685 seconds, init for row-wise cost 1.673264 seconds
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.912160 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Debug] Using Sparse Multi-Val Bin
[LightGBM] [Info] Total Bins 27836
[LightGBM] [Info] Number of data points in the train set: 4750592, number of used features: 49
[LightGBM] [Debug] Use subset for bagging
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.000239 -> initscore=-8.340142
[LightGBM] [Info] Start training from score -8.340142
[LightGBM] [Debug] Re-bagging, using 3801989 data to train
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fb4d4013756, pid=944460, tid=0x00007fb55da09640
#
# JRE version: OpenJDK Runtime Environment (8.0_382-b05) (build 1.8.0_382-8u382-ga-1~22.04.1-b05)
# Java VM: OpenJDK 64-Bit Server VM (25.382-b05 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [lib_lightgbm.so+0x413756] LightGBM::SerialTreeLearner::SplitInner(LightGBM::Tree*, int, int*, int*, bool)+0xe16
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# PATH_TO_APP_CACHE/appcache/application_1697132938548_0291/container_1697132938548_0291_01_000003/hs_err_pid944460.log
Environment:
Dask version: 2023.10.0
Python version:3.10.12
Operating System:
Install method (conda, pip, source): pip
all the dependencies:
Package Version
It looks like this problem is being tackled at microsoft/LightGBM#6196, so I will close this issue. Please reopen if anything is left that needs to be done on the distributed side.
Describe the issue:
Whenever I run this code, the dask job crashes and all the workers get lost and then the task just hangs forever. While if I provide small size files then the same code works fine. (<100MB). I'm not sure what the issue is. Pasting the error below in "Anything else we need to know section"
Minimal Complete Verifiable Example:
already pasted above.
Anything else we need to know?:
here is the error log that I see:
Environment:
all the dependencies:
Package Version
asttokens 2.4.1
bokeh 3.3.0
cffi 1.16.0
click 8.1.7
cloudpickle 3.0.0
comm 0.2.0
contourpy 1.1.1
cryptography 41.0.5
dask 2023.10.0
dask-yarn 0.9+2.g8eed5e2
debugpy 1.8.0
decorator 5.1.1
distributed 2023.10.0
exceptiongroup 1.1.3
executing 2.0.1
fsspec 2023.10.0
grpcio 1.59.0
importlib-metadata 6.8.0
ipython 8.17.2
jedi 0.19.1
Jinja2 3.1.2
joblib 1.3.2
jupyter_client 8.6.0
jupyter_core 5.5.0
lightgbm 4.1.0
locket 1.0.0
lz4 4.3.2
MarkupSafe 2.1.3
matplotlib-inline 0.1.6
msgpack 1.0.7
nest-asyncio 1.5.8
numpy 1.26.1
packaging 23.2
pandas 2.1.1
parso 0.8.3
partd 1.4.1
pexpect 4.8.0
Pillow 10.1.0
pip 22.0.2
platformdirs 4.0.0
prompt-toolkit 3.0.40
protobuf 4.24.4
psutil 5.9.6
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 13.0.0
pycparser 2.21
Pygments 2.16.1
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
pyzmq 25.1.1
scikit-learn 1.3.2
scipy 1.11.3
setuptools 59.6.0
six 1.16.0
skein 0.8.2
sortedcontainers 2.4.0
stack-data 0.6.3
tblib 3.0.0
threadpoolctl 3.2.0
toolz 0.12.0
tornado 6.3.3
traitlets 5.13.0
tzdata 2023.3
urllib3 2.0.7
wcwidth 0.2.9
xyzservices 2023.10.0
zict 3.0.0
zipp 3.17.0
The text was updated successfully, but these errors were encountered: