Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] LightGBMError: Invalid Kernel Arguments #4618

Open
Tracked by #5153
pn12 opened this issue Sep 19, 2021 · 9 comments
Open
Tracked by #5153

[GPU] LightGBMError: Invalid Kernel Arguments #4618

pn12 opened this issue Sep 19, 2021 · 9 comments
Labels

Comments

@pn12
Copy link

pn12 commented Sep 19, 2021

I am running LGBM on gpu with data size - (400, 116379), columns are float datatype. The same code runs fine on another dataset while I get the below error on this another dataset.

Can anybody suggest what is wrong here? Below is the traceback of error encountered -

Thanks

/opt/conda/lib/python3.7/site-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    820                     eval_init_score=eval_init_score, eval_metric=eval_metric,
    821                     early_stopping_rounds=early_stopping_rounds, verbose=verbose, feature_name=feature_name,
--> 822                     categorical_feature=categorical_feature, callbacks=callbacks, init_model=init_model)
    823         return self
    824 

/opt/conda/lib/python3.7/site-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    686                               evals_result=evals_result, fobj=self._fobj, feval=eval_metrics_callable,
    687                               verbose_eval=verbose, feature_name=feature_name,
--> 688                               callbacks=callbacks, init_model=init_model)
    689 
    690         if evals_result:

/opt/conda/lib/python3.7/site-packages/lightgbm/engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    247                                     evaluation_result_list=None))
    248 
--> 249         booster.update(fobj=fobj)
    250 
    251         evaluation_result_list = []

/opt/conda/lib/python3.7/site-packages/lightgbm/basic.py in update(self, train_set, fobj)
   2643             _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
   2644                 self.handle,
-> 2645                 ctypes.byref(is_finished)))
   2646             self.__is_predicted_cur_iter = [False for _ in range(self.__num_dataset)]
   2647             return is_finished.value == 1

/opt/conda/lib/python3.7/site-packages/lightgbm/basic.py in _safe_call(ret)
    108     """
    109     if ret != 0:
--> 110         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    111 
    112 

LightGBMError: Invalid Kernel Arguments

@StrikerRUS StrikerRUS added the bug label Sep 19, 2021
@StrikerRUS StrikerRUS changed the title LightGBMError: Invalid Kernel Arguments [GPU] LightGBMError: Invalid Kernel Arguments Sep 19, 2021
@jameslamb
Copy link
Collaborator

Thanks @pn12 .

Are you able to provide a reproducible example, some completely self-contained code we could run which reproduces the problem? It will be difficult for maintainers to figure out what is happening here and help you with only a stack trace and no additional information.

@nebuntu
Copy link

nebuntu commented Oct 5, 2021

I've also encountered the same error.
I could run almost the same code with CPU version (without 'device: gpu' description) successfully.

My environment:
OS: Ubuntu 18.04 LTS
GPU: NVIDIA Quadro RTX 5000
NVIDIA Driver Version: 460.32.03
CUDA Version: 11.2
Python: 3.8.8 (anaconda3-2021.05 on pyenv)

My code:

for k in tqdm(range(10)):
    for i in range(10):
        df_subsamp_train_tmp = pd.concat([subsample_ssd(k), subsample_hc(k,i)])
        df_subsamp_train = df_subsamp_train_tmp.sample(frac=1, random_state=k)
        
        X_subsamp_train = df_subsamp_train.iloc[ : , 2:]
        y_subsamp_train = df_subsamp_train.iloc[ : , 0]
        param_dataset = {'feature_pre_filter': False}
        data_hpt_train = lgb.Dataset(X_subsamp_train.values, y_subsamp_train.values, free_raw_data=False,
                                        params=param_dataset)
    
        def objective(trial):
            param = {
                'objective': 'binary',
                'metric': 'auc',
                'verbosity': -1,
                'boosting_type': 'gbdt',
                'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
                'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),
                'num_leaves': trial.suggest_int('num_leaves', 2, 256),
                'feature_fraction': trial.suggest_uniform('feature_fraction', 0.4, 1.0),
                'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.4, 1.0),
                'bagging_freq': trial.suggest_int('bagging_freq', 1, 7),
                'min_child_samples': trial.suggest_int('min_child_samples', 3, 20),
                'device' : 'gpu'
            }
            
            gbm = lgb.cv(param, data_hpt_train, nfold=10, early_stopping_rounds=20)
            return np.mean(eval_hist['auc_mean'])

        study = optuna.create_study(sampler=optuna.samplers.RandomSampler(seed=k),
                                    pruner=optuna.pruners.MedianPruner(n_warmup_steps=10),
                                    direction='maximize')
        study.optimize(objective, n_trials=100)

        best_params[k][i] = study.best_trial.params

The messages (including error:)

  0%|          | 0/10 [00:00<?, ?it/s][I 2021-10-05 18:54:15,458] A new study created in memory with name: no-name-ab69c737-ab94-4326-8758-20c0b6fe0a41
/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py:578: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
[W 2021-10-05 18:54:39,967] Trial 0 failed because of the following error: LightGBMError('Invalid Kernel Arguments')
Traceback (most recent call last):
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "<ipython-input-10-e20e1369c52b>", line 44, in objective
    gbm = lgb.cv(param, data_hpt_train, nfold=10, early_stopping_rounds=20)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 641, in cv
    cvfolds.update(fobj=fobj)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 354, in handler_function
    ret.append(getattr(booster, name)(*args, **kwargs))
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 3016, in update
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 125, in _safe_call
    raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Invalid Kernel Arguments
  0%|          | 0/10 [00:52<?, ?it/s]
---------------------------------------------------------------------------
LightGBMError                             Traceback (most recent call last)
<ipython-input-10-e20e1369c52b> in <module>
     48                                     pruner=optuna.pruners.MedianPruner(n_warmup_steps=10),
     49                                     direction='maximize')
---> 50         study.optimize(objective, n_trials=100)
     51 
     52         best_params[k][i] = study.best_trial.params

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/study.py in optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    398             )
    399 
--> 400         _optimize(
    401             study=self,
    402             func=func,

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64     try:
     65         if n_jobs == 1:
---> 66             _optimize_sequential(
     67                 study,
     68                 func,

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    161 
    162         try:
--> 163             trial = _run_trial(study, func, catch)
    164         except Exception:
    165             raise

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
    262 
    263     if state == TrialState.FAIL and func_err is not None and not isinstance(func_err, catch):
--> 264         raise func_err
    265     return trial
    266 

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
    211 
    212     try:
--> 213         value_or_values = func(trial)
    214     except exceptions.TrialPruned as e:
    215         # TODO(mamu): Handle multi-objective cases.

<ipython-input-10-e20e1369c52b> in objective(trial)
     42             early_stopping_rounds = lgb.early_stopping(20)
     43             #log_eval = lgb.log_evaluation(period=50)
---> 44             gbm = lgb.cv(param, data_hpt_train, nfold=10, early_stopping_rounds=20)
     45             return np.mean(eval_hist['auc_mean'])
     46 

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py in cv(params, train_set, num_boost_round, folds, nfold, stratified, shuffle, metrics, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, fpreproc, verbose_eval, show_stdv, seed, callbacks, eval_train_metric, return_cvbooster)
    639                                     end_iteration=num_boost_round,
    640                                     evaluation_result_list=None))
--> 641         cvfolds.update(fobj=fobj)
    642         res = _agg_cv_result(cvfolds.eval_valid(feval), eval_train_metric)
    643         for _, key, mean, _, std in res:

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py in handler_function(*args, **kwargs)
    352             ret = []
    353             for booster in self.boosters:
--> 354                 ret.append(getattr(booster, name)(*args, **kwargs))
    355             return ret
    356         return handler_function

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py in update(self, train_set, fobj)
   3014             if self.__set_objective_to_none:
   3015                 raise LightGBMError('Cannot update due to null objective function.')
-> 3016             _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
   3017                 self.handle,
   3018                 ctypes.byref(is_finished)))

~/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py in _safe_call(ret)
    123     """
    124     if ret != 0:
--> 125         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    126 
    127 

LightGBMError: Invalid Kernel Arguments

Any help would be appreciated.
Thanks in advance!

@jameslamb
Copy link
Collaborator

Thanks @nebuntu . Are you able to provide a reproducible example?

We won't be able to run that code you've provided, since it's missing:

  • import statements
  • code to define the input data (e.g. subsample_ssd, subsample_hc)
  • version of lightgbm you are using and how you installed it

Also, the code you've provided raises some additional questions.

  • are you able to provide some example code that does not involve optuna? Or do you only see this issue with optuna and suspect that it's related to how that library is working with lightgbm
  • the provided code is using two for-loops....are you saying that this Invalid Kernel Arguments error only shows up for you on some training runs, randomly? If not and it always happens, we'd appreciate a sample without those for loops to try to narrow it down further.

Without a reproducible example, I suspect it will take a large amount of effort for maintainers to figure out the root cause of this issue. Anything you and anyone else reading this could do to help us narrow this down will increase the likelihood of resolving this issue.

==================

Also, I noticed that it looks like you've cross-posted this issue to Stack Overflow: https://stackoverflow.com/questions/69452606/lightgbm-error-invalid-kernel-arguments. We would have preferred to have this conversation in only one place, but now that that's been posted...if you get an answer there or some helpful suggestions, please do share them here in this issue.

@nebuntu
Copy link

nebuntu commented Oct 6, 2021

Thank you @jameslamb for the comment!
I am willing to update my post with the missing information, but at first, I will try running the code without Optuna and/or for-loops on your advice.
I will come back soon.

@nebuntu
Copy link

nebuntu commented Oct 18, 2021

Sorry for my late update.
Your comments would be appreciated!

  1. To simplify, I modified my code to minimum as a reproducible example.
    Please find my code and data below:
    (https://drive.google.com/drive/folders/1iWD5LLelxtYjFgj5Osk8KiRhGU_S0ez7?usp=sharing)
    (you can use .ipynb or .py)
    Sadly, I ran into the same error:
/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py:179: UserWarning: Converting column-vector to 1d array
  _log_warning('Converting column-vector to 1d array')
[LightGBM] [Info] Number of positive: 81, number of negative: 82
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.253344 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 81, number of negative: 82
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.346040 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 82, number of negative: 81
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.401490 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 82, number of negative: 81
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 163, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.14 MB) transferred to GPU in 0.299658 secs. 0 sparse feature groups
[LightGBM] [Info] Number of positive: 82, number of negative: 82
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 4990126
[LightGBM] [Info] Number of data points in the train set: 164, number of used features: 71631
[LightGBM] [Info] Using GPU Device: Quadro RTX 5000, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 71631 dense feature groups (11.20 MB) transferred to GPU in 0.497587 secs. 0 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.496933 -> initscore=-0.012270
[LightGBM] [Info] Start training from score -0.012270
[LightGBM] [Debug] Re-bagging, using 132 data to train
[LightGBM] [Info] Increasing preallocd_max_num_wg_ to 17908 for launching more workgroups
Traceback (most recent call last):
  File "my_lgbm.py", line 28, in <module>
    gbm = lgb.cv(param, data_train)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 640, in cv
    cvfolds.update(fobj=fobj)
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/engine.py", line 353, in handler_function
    ret.append(getattr(booster, name)(*args, **kwargs))
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 3021, in update
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
  File "/home/taka/.pyenv/versions/anaconda3-2021.05/lib/python3.8/site-packages/lightgbm/basic.py", line 125, in _safe_call
    raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Invalid Kernel Arguments

For your information, I could run example python code of LightGBM with GPU without error (I added " 'device': 'gpu' " option).
I wonder if the dimension of my data is too large for GPU processing??

  1. My LGBM version: 3.3.0.99
    I installed LGBM basically following the steps shown in the official docs.
    Please note, I didn't install 'nvidia-375' and 'nvidia-opencl-icd-375', since my current NVIDIA driver is ver.460 (and I need it for Keras on the same machine).
sudo apt-get update
sudo apt-get install --no-install-recommends nvidia-opencl-dev opencl-headers
sudo init 6
sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
git clone --recursive https://github.com/microsoft/LightGBM
cd LightGBM
mkdir build
cd build
cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
make -j$(nproc)
cd ..
cd python-package/
sudo python setup.py install --precompile
cd ..

@StrikerRUS
Copy link
Collaborator

@nebuntu Thanks a lot for the repro! I can confirm this error appears on my local Windows machine with the latest LightGBM and RTX 2080.

@nebuntu
Copy link

nebuntu commented Oct 26, 2021

Thank you @StrikerRUS for confirming!
Do you have any suggestions about how I can solve this problem?

@StrikerRUS
Copy link
Collaborator

@nebuntu Unfortunately, I don't have any yet. 🙁

@jameslamb jameslamb mentioned this issue Apr 14, 2022
60 tasks
@grkremer
Copy link

I'm having the exact same problem trying to use datasets with 100.000 features or higher, here is the reproduceble code:
https://github.com/grkremer/cuda_synthetic/blob/main/cuda_sintetic_testing.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants