-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] How should I determine categorical features #5932
Comments
Thanks for using LightGBM. Before we look into this... is it absolutely necessary that you use I strongly recommend updating to |
Sure. I don't know why the version I install is 2.1.2, maybe it lies on the mirror? Now I have update lightgbm to v3.3.5 |
Ok thank you! Someone here will provide an answer using |
Check the code here. It basically says that if you provide a pandas dataframe, it will do the following. If it detects categorical columns, it will automatically encode them in integers (check x.cat.codes in line 688.) So you don't need to do the conversion yourself. Detection of categorical columns is also automatic so you don't need X[CATEGORICAL_COLS] = X[CATEGORICAL_COLS].astype("category"). You should use categorical_feature feature only if you want to explicitly set categorical features ignoring type assignment of pandas dataframe. For example a numeric column "N" which took 40 different values can also be interpreted as categorical column with 40 classes. But this won't happen automatically, since pandas thinks N is a numeric column. With categorical_feature, you can force N to be a categorical feature. You should this option in such cases.
|
Got it and thank you very much for this meticulous explanation ! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Version: [Python 3.7+] [lightgbm 2.1.2]
Thank you for sharing this great work. I have several questions about handling categorical features which have really confused me for a long time.
Now, I pass a pandas dataframe to lightgbm by
or
And my questions are
Q1: Which method I use is right?
Q2: Do I need to both change dataframe column type to category AND specify categorical_feature when defining lightgbm model as the same time?
Q3: What should I do with categorical feature? Should I transform it to label encoding from 0 to k-1 (assume k is the categories) or I can just pass the string column to lightgbm?
Thank you for your time.
The text was updated successfully, but these errors were encountered: