Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightgbm train with setting categorical feature as un-exist column names, and then error when predicting after reload model from file? #6692

Open
mangolzy opened this issue Oct 22, 2024 · 1 comment

Comments

@mangolzy
Copy link

lgb version 4.5.0

error: ValueError: train and valid dataset categorical_feature do not match.

setting:
clf = lgb.train(params=params, train_set=lgb_train,
valid_sets=[lgb_train, lgb_test],
valid_names=['train', 'test'],
feval=ks_metric,
categorical_feature=cflist)
when categorical_feature is set to a listA with columns not in train_set columns(listB), it works well when train and predict onsite.
but after save_model to file and reload it by lgb.Booster().
and try to lgb.predict(X) with a new dataframe with the proper feature list(listB) used in training, it output the above error, and it's not removed if i added the listA in X.
So, is it possible to make the current model work in predicting? what should i add as parameters perhaps?

@jmoralez
Copy link
Collaborator

Hey @mangolzy, thanks for using LightGBM.

That error usually means that the columns in your input dataframe that are expected to be categoricals are not. Can you make sure that they are? e.g. X[listB] = X[listB].astype('category').

If you're able to provide a minimal reproducible example we can provide further help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants