[GPU] LightGBMError: Invalid Kernel Arguments #4618

pn12 · 2021-09-19T12:45:23Z

I am running LGBM on gpu with data size - (400, 116379), columns are float datatype. The same code runs fine on another dataset while I get the below error on this another dataset.

Can anybody suggest what is wrong here? Below is the traceback of error encountered -

Thanks

/opt/conda/lib/python3.7/site-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    820                     eval_init_score=eval_init_score, eval_metric=eval_metric,
    821                     early_stopping_rounds=early_stopping_rounds, verbose=verbose, feature_name=feature_name,
--> 822                     categorical_feature=categorical_feature, callbacks=callbacks, init_model=init_model)
    823         return self
    824 

/opt/conda/lib/python3.7/site-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    686                               evals_result=evals_result, fobj=self._fobj, feval=eval_metrics_callable,
    687                               verbose_eval=verbose, feature_name=feature_name,
--> 688                               callbacks=callbacks, init_model=init_model)
    689 
    690         if evals_result:

/opt/conda/lib/python3.7/site-packages/lightgbm/engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    247                                     evaluation_result_list=None))
    248 
--> 249         booster.update(fobj=fobj)
    250 
    251         evaluation_result_list = []

/opt/conda/lib/python3.7/site-packages/lightgbm/basic.py in update(self, train_set, fobj)
   2643             _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
   2644                 self.handle,
-> 2645                 ctypes.byref(is_finished)))
   2646             self.__is_predicted_cur_iter = [False for _ in range(self.__num_dataset)]
   2647             return is_finished.value == 1

/opt/conda/lib/python3.7/site-packages/lightgbm/basic.py in _safe_call(ret)
    108     """
    109     if ret != 0:
--> 110         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    111 
    112 

LightGBMError: Invalid Kernel Arguments

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-10-03T17:57:13Z

Thanks @pn12 .

Are you able to provide a reproducible example, some completely self-contained code we could run which reproduces the problem? It will be difficult for maintainers to figure out what is happening here and help you with only a stack trace and no additional information.

nebuntu · 2021-10-05T10:09:27Z

I've also encountered the same error.
I could run almost the same code with CPU version (without 'device: gpu' description) successfully.

My environment:
OS: Ubuntu 18.04 LTS
GPU: NVIDIA Quadro RTX 5000
NVIDIA Driver Version: 460.32.03
CUDA Version: 11.2
Python: 3.8.8 (anaconda3-2021.05 on pyenv)

My code:

for k in tqdm(range(10)):
    for i in range(10):
        df_subsamp_train_tmp = pd.concat([subsample_ssd(k), subsample_hc(k,i)])
        df_subsamp_train = df_subsamp_train_tmp.sample(frac=1, random_state=k)
        
        X_subsamp_train = df_subsamp_train.iloc[ : , 2:]
        y_subsamp_train = df_subsamp_train.iloc[ : , 0]
        param_dataset = {'feature_pre_filter': False}
        data_hpt_train = lgb.Dataset(X_subsamp_train.values, y_subsamp_train.values, free_raw_data=False,
                                        params=param_dataset)
    
        def objective(trial):
            param = {
                'objective': 'binary',
                'metric': 'auc',
                'verbosity': -1,
                'boosting_type': 'gbdt',
                'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
                'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),
                'num_leaves': trial.suggest_int('num_leaves', 2, 256),
                'feature_fraction': trial.suggest_uniform('feature_fraction', 0.4, 1.0),
                'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.4, 1.0),
                'bagging_freq': trial.suggest_int('bagging_freq', 1, 7),
                'min_child_samples': trial.suggest_int('min_child_samples', 3, 20),
                'device' : 'gpu'
            }
            
            gbm = lgb.cv(param, data_hpt_train, nfold=10, early_stopping_rounds=20)
            return np.mean(eval_hist['auc_mean'])

        study = optuna.create_study(sampler=optuna.samplers.RandomSampler(seed=k),
                                    pruner=optuna.pruners.MedianPruner(n_warmup_steps=10),
                                    direction='maximize')
        study.optimize(objective, n_trials=100)

        best_params[k][i] = study.best_trial.params

The messages (including error:)

  0%|          | 0/10 [00:00<?, ?it/s][I 2021-10-05 18:54:15,458] A new study created in memory with name: no-name-ab69c737-ab94-4326-8758-20c0b6fe0a41
/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py:578: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
[W 2021-10-05 18:54:39,967] Trial 0 failed because of the following error: LightGBMError('Invalid Kernel Arguments')
Traceback (most recent call last):
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "<ipython-input-10-e20e1369c52b>", line 44, in objective
    gbm = lgb.cv(param, data_hpt_train, nfold=10, early_stopping_rounds=20)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 641, in cv
    cvfolds.update(fobj=fobj)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 354, in handler_function
    ret.append(getattr(booster, name)(*args, **kwargs))
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 3016, in update
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 125, in _safe_call
    raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Invalid Kernel Arguments
  0%|          | 0/10 [00:52<?, ?it/s]
---------------------------------------------------------------------------
LightGBMError                             Traceback (most recent call last)
<ipython-input-10-e20e1369c52b> in <module>
     48                                     pruner=optuna.pruners.MedianPruner(n_warmup_steps=10),
     49                                     direction='maximize')
---> 50         study.optimize(objective, n_trials=100)
     51 
     52         best_params[k][i] = study.best_trial.params

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/study.py in optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    398             )
    399 
--> 400         _optimize(
    401             study=self,
    402             func=func,

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64     try:
     65         if n_jobs == 1:
---> 66             _optimize_sequential(
     67                 study,
     68                 func,

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    161 
    162         try:
--> 163             trial = _run_trial(study, func, catch)
    164         except Exception:
    165             raise

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
    262 
    263     if state == TrialState.FAIL and func_err is not None and not isinstance(func_err, catch):
--> 264         raise func_err
    265     return trial
    266 

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
    211 
    212     try:
--> 213         value_or_values = func(trial)
    214     except exceptions.TrialPruned as e:
    215         # TODO(mamu): Handle multi-objective cases.

<ipython-input-10-e20e1369c52b> in objective(trial)
     42             early_stopping_rounds = lgb.early_stopping(20)
     43             #log_eval = lgb.log_evaluation(period=50)
---> 44             gbm = lgb.cv(param, data_hpt_train, nfold=10, early_stopping_rounds=20)
     45             return np.mean(eval_hist['auc_mean'])
     46 

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py in cv(params, train_set, num_boost_round, folds, nfold, stratified, shuffle, metrics, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, fpreproc, verbose_eval, show_stdv, seed, callbacks, eval_train_metric, return_cvbooster)
    639                                     end_iteration=num_boost_round,
    640                                     evaluation_result_list=None))
--> 641         cvfolds.update(fobj=fobj)
    642         res = _agg_cv_result(cvfolds.eval_valid(feval), eval_train_metric)
    643         for _, key, mean, _, std in res:

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py in handler_function(*args, **kwargs)
    352             ret = []
    353             for booster in self.boosters:
--> 354                 ret.append(getattr(booster, name)(*args, **kwargs))
    355             return ret
    356         return handler_function

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py in update(self, train_set, fobj)
   3014             if self.__set_objective_to_none:
   3015                 raise LightGBMError('Cannot update due to null objective function.')
-> 3016             _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
   3017                 self.handle,
   3018                 ctypes.byref(is_finished)))

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py in _safe_call(ret)
    123     """
    124     if ret != 0:
--> 125         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    126 
    127 

LightGBMError: Invalid Kernel Arguments

Any help would be appreciated.
Thanks in advance!

jameslamb · 2021-10-06T02:42:17Z

Thanks @nebuntu . Are you able to provide a reproducible example?

We won't be able to run that code you've provided, since it's missing:

import statements
code to define the input data (e.g. subsample_ssd, subsample_hc)
version of lightgbm you are using and how you installed it

Also, the code you've provided raises some additional questions.

are you able to provide some example code that does not involve optuna? Or do you only see this issue with optuna and suspect that it's related to how that library is working with lightgbm
the provided code is using two for-loops....are you saying that this Invalid Kernel Arguments error only shows up for you on some training runs, randomly? If not and it always happens, we'd appreciate a sample without those for loops to try to narrow it down further.

Without a reproducible example, I suspect it will take a large amount of effort for maintainers to figure out the root cause of this issue. Anything you and anyone else reading this could do to help us narrow this down will increase the likelihood of resolving this issue.

==================

Also, I noticed that it looks like you've cross-posted this issue to Stack Overflow: https://stackoverflow.com/questions/69452606/lightgbm-error-invalid-kernel-arguments. We would have preferred to have this conversation in only one place, but now that that's been posted...if you get an answer there or some helpful suggestions, please do share them here in this issue.

nebuntu · 2021-10-06T04:13:09Z

Thank you @jameslamb for the comment!
I am willing to update my post with the missing information, but at first, I will try running the code without Optuna and/or for-loops on your advice.
I will come back soon.

nebuntu · 2021-10-18T14:38:53Z

Sorry for my late update.
Your comments would be appreciated!

To simplify, I modified my code to minimum as a reproducible example.
Please find my code and data below:
(https://drive.google.com/drive/folders/1iWD5LLelxtYjFgj5Osk8KiRhGU_S0ez7?usp=sharing)
(you can use .ipynb or .py)
Sadly, I ran into the same error:

/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py:179: UserWarning: Converting column-vector to 1d array
  _log_warning('Converting column-vector to 1d array')
[LightGBM] [Info] Number of positive: 81, number of negative: 82
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.253344 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 81, number of negative: 82
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.346040 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 82, number of negative: 81
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.401490 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 82, number of negative: 81
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.299658 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 82, number of negative: 82
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 164, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.20 MB) transferred to GPU in 0.497587 secs. 0 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.496933 -> initscore=-0.012270
[LightGBM] [Info] Start training from score -0.012270
[LightGBM] [Debug] Re-bagging, using 132 data to train
[LightGBM] [Info] Increasing preallocd_max_num_wg_ to 17908 for launching more workgroups
Traceback (most recent call last):
  File "my_lgbm.py", line 28, in <module>
    gbm = lgb.cv(param, data_train)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 640, in cv
    cvfolds.update(fobj=fobj)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 353, in handler_function
    ret.append(getattr(booster, name)(*args, **kwargs))
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 3021, in update
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 125, in _safe_call
    raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Invalid Kernel Arguments

For your information, I could run example python code of LightGBM with GPU without error (I added " 'device': 'gpu' " option).
I wonder if the dimension of my data is too large for GPU processing??

My LGBM version: 3.3.0.99
I installed LGBM basically following the steps shown in the official docs.
Please note, I didn't install 'nvidia-375' and 'nvidia-opencl-icd-375', since my current NVIDIA driver is ver.460 (and I need it for Keras on the same machine).

sudo apt-get update
sudo apt-get install --no-install-recommends nvidia-opencl-dev opencl-headers
sudo init 6
sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
git clone --recursive https://github.com/microsoft/LightGBM
cd LightGBM
mkdir build
cd build
cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
make -j$(nproc)
cd ..
cd python-package/
sudo python setup.py install --precompile
cd ..

StrikerRUS · 2021-10-19T17:01:06Z

@nebuntu Thanks a lot for the repro! I can confirm this error appears on my local Windows machine with the latest LightGBM and RTX 2080.

nebuntu · 2021-10-26T12:38:29Z

Thank you @StrikerRUS for confirming!
Do you have any suggestions about how I can solve this problem?

StrikerRUS · 2021-10-26T13:57:59Z

@nebuntu Unfortunately, I don't have any yet. 🙁

grkremer · 2023-07-26T23:16:53Z

I'm having the exact same problem trying to use datasets with 100.000 features or higher, here is the reproduceble code:
https://github.com/grkremer/cuda_synthetic/blob/main/cuda_sintetic_testing.ipynb

StrikerRUS added the bug label Sep 19, 2021

StrikerRUS changed the title ~~LightGBMError: Invalid Kernel Arguments~~ [GPU] LightGBMError: Invalid Kernel Arguments Sep 19, 2021

jameslamb added the awaiting response label Oct 3, 2021

StrikerRUS removed the awaiting response label Oct 19, 2021

jameslamb mentioned this issue Apr 14, 2022

[RFC] 4.0.0 Release #5153

Closed

60 tasks

wil70 mentioned this issue Dec 21, 2023

[CLI, GPU, Win x64] LightGBM GPU doesn't work for 100K+ features --> Met Exceptions: Invalid Kernel Arguments #6220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] LightGBMError: Invalid Kernel Arguments #4618

[GPU] LightGBMError: Invalid Kernel Arguments #4618

pn12 commented Sep 19, 2021

jameslamb commented Oct 3, 2021

nebuntu commented Oct 5, 2021

jameslamb commented Oct 6, 2021

nebuntu commented Oct 6, 2021

nebuntu commented Oct 18, 2021

StrikerRUS commented Oct 19, 2021

nebuntu commented Oct 26, 2021

StrikerRUS commented Oct 26, 2021

grkremer commented Jul 26, 2023

[GPU] LightGBMError: Invalid Kernel Arguments #4618

[GPU] LightGBMError: Invalid Kernel Arguments #4618

Comments

pn12 commented Sep 19, 2021

jameslamb commented Oct 3, 2021

nebuntu commented Oct 5, 2021

jameslamb commented Oct 6, 2021

nebuntu commented Oct 6, 2021

nebuntu commented Oct 18, 2021

StrikerRUS commented Oct 19, 2021

nebuntu commented Oct 26, 2021

StrikerRUS commented Oct 26, 2021

grkremer commented Jul 26, 2023