Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix add features #2754

Merged
merged 28 commits into from
Oct 26, 2020
Merged

Fix add features #2754

merged 28 commits into from
Oct 26, 2020

Conversation

guolinke
Copy link
Collaborator

@guolinke guolinke commented Feb 8, 2020

fix #2744 and fix #2603

@StrikerRUS could you help for the test cases?

  • test cases
  • merge data sources at Python side

@StrikerRUS
Copy link
Collaborator

@guolinke

could you help for the test cases?

Sure, but #2594 removes test_add_features_* tests completely. I think we should merge it firstly.

@StrikerRUS StrikerRUS mentioned this pull request Feb 11, 2020
@guolinke
Copy link
Collaborator Author

guolinke commented Jul 7, 2020

@StrikerRUS
sorry for missing that. I will fix it when I have time.

@guolinke
Copy link
Collaborator Author

ping @StrikerRUS

src/io/dataset.cpp Outdated Show resolved Hide resolved
@guolinke
Copy link
Collaborator Author

guolinke commented Aug 5, 2020

@StrikerRUS do you think this is ready?

@StrikerRUS
Copy link
Collaborator

do you think this is ready?

No, I don't think so. Python package still lacks tests and logic for updating feature names and category fields of Dataset class (#2754 (comment)) (maybe something more). I'm sorry, I really don't have a lot of time right now to propose a thoughtful code update. PR reviews looks more urgent and takes much time, unfortunately.
I believe that this PR should not be blocking us from 3.0 release as there are no breaking changes here, only bug fixes that we can release in a next minor release.

@guolinke
Copy link
Collaborator Author

guolinke commented Aug 5, 2020

@StrikerRUS no problem, we can skip this for now.

@guolinke guolinke mentioned this pull request Aug 10, 2020
10 tasks
@StrikerRUS
Copy link
Collaborator

Also fix #3221.

@StrikerRUS
Copy link
Collaborator

@guolinke

Speaking about categorical features, it looks like it is not trivial to set them back at the Python side after calling AddFeaturesFrom(). In other dataset they can be provided by names, by indices, or even have auto value with pandas dataframe. The same cases are valid for self dataset. I don't know how to merge all possible combinations properly. Moreover, after cpp side we can have changed feature names (in case of duplicated names). I think the best option for now is just to warn users about this.

TBH, I don't want to hold this PR more...

@guolinke
Copy link
Collaborator Author

@StrikerRUS Thanks very much.
I think this is a rare use case, so no problem for the incomplete function.
As long as the numerical features could be merged, I think it is okay.

@guolinke
Copy link
Collaborator Author

@StrikerRUS I cannot approve my own PR, you can approve and merge it if you think it is ready

@guolinke guolinke mentioned this pull request Oct 26, 2020
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke Thank you! I think it's ready.

@StrikerRUS StrikerRUS merged commit 53977f3 into master Oct 26, 2020
@StrikerRUS StrikerRUS deleted the fix-add-features branch October 26, 2020 23:35
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants