-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some paramater hints when loading from binary file #4701
Conversation
@hzy46 Thanks for your contribution! |
OK, got it. |
src/io/dataset_loader.cpp
Outdated
SetHeader(filename); | ||
if (filename != nullptr && CheckCanLoadFromBin(filename) == "") { | ||
// SetHeader should only be called when loading from text file | ||
SetHeader(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testing cases are failing because SetHeader
does not only handle cases where input are from files. It also reads categorical feature indices from the config parameters (see the part outside the if (filename != nullptr) { ... }
).
Skipping SetHeader
directly here will cause errors when we load data from numpy or pandas arrays (where filename == nullptr
) and use categorical features.
So I think we should move the the check filename != nullptr && CheckCanLoadFromBin(filename) == ""
into SetHeader
. That is, we change if (filename != nullptr) { ... }
into if (filename != nullptr && CheckCanLoadFromBin(filename) == "") { ... }
@shiyu1994 @StrikerRUS Thanks for your comment. I have updated the code. Please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Just one question below:
} | ||
if (config_.ignore_column != "") { | ||
Log::Warning("Config ignore_column works only in case of loading data directly from text file. It will be ignored when loading from binary file."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why there are no similar checks for config_.two_round
and config_.header
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be obvious that two_round
and header
is not useful when loading from a binary file. What about your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to agree with you about that header
is irrelevant param when loading from binary, but I think two_round
isn't so obvious. I'd better list all parameters here for the consistency despite whether it is obvious or not that they are not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge this PR now? If there's any other parameters need a similar explanation, maybe we can leave it in another PR. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Only header
and two_round
have notes in their description about that they
works only in case of loading data directly from text file
but providing them for binary file doesn't trigger this warning.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
#4657 describes
ignore_column
will cause an error when loading from binary file. I think here're two related issues in the code:SetHeader
should only be called when loading from text file. Now, it is also called when loading the data from binary file. This is why error mentioned in issue 4657 happens.LightGBM/src/io/dataset_loader.cpp
Line 25 in 13ed38c
ignore_column
, parameterstwo_round
,header
,label_column
,weight_column
,group_column
also have no effect when loading from binary. I changed the doc. Need to confirm whether we should add warning for these parameters.