Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.feature_name.__len__() do not equal to Dataset.num_feature() after add_features_from another dataset #3221

Closed
JiaRu2016 opened this issue Jul 12, 2020 · 7 comments

Comments

@JiaRu2016
Copy link

How you are using LightGBM?

  • Python package

Environment info

Operating System: Ubuntu 18.04

CPU/GPU model: GPU

C++ compiler version: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

CMake version: cmake version 3.17.1

Python version: 3.8

LightGBM version or commit hash: on both master and tag=v2.3.1 can reproduce this issue

Reproducible example(s)

import lightgbm as lgb
import numpy as np

ds0 = lgb.Dataset(np.arange(12).reshape(3,4), feature_name=[f'X{i}' for i in range(4)])
ds0.construct()
len(ds0.feature_name), ds0.num_feature()   # 4, 4

ds1 = lgb.Dataset(np.arange(9).reshape(3,3), feature_name=[f'Z{i}' for i in range(3)])
ds1.construct()
len(ds1.feature_name), ds1.num_feature()   # 3, 3

ds0.add_features_from(ds1)    #  <---------------  may be a bug come from here

len(ds0.feature_name), ds0.num_feature()    # 4, 7

ds0.construct()           # even after `construct` this problem still exits
len(ds0.feature_name), ds0.num_feature()     # 4, 7

Steps to reproduce

  1. open a jupyter nobtbook
  2. run codes above, should get the same result as me (if reproduceable)
@guolinke
Copy link
Collaborator

refer to #2754

@JiaRu2016
Copy link
Author

JiaRu2016 commented Jul 12, 2020

I have merged fix-add-features branch into current master branch, rebuild it and reinstall python package. However this problem still exits (exactly same output as before). Moreover when constructing dataset a Warning message raise:

ds0 = lgb.Dataset(np.random.normal(size=(3,4)), label=[1,2,3], feature_name=[f'X{i}' for i in range(4)], free_raw_data=False)
ds0.construct()

[LightGBM] [Warning] There are no meaningful features, as all feature values are constant.

@guolinke
Copy link
Collaborator

@JiaRu2016 I think the feature name in python-package is not fixed yet in the PR, it needs to get from lightgbm.dll.
ping @StrikerRUS to confirm.

@guolinke
Copy link
Collaborator

@JiaRu2016 BTW, LightGBM will filter the feature, based on min_data (and min_data_per_bin).
In your test case, the number of data is smaller than min_data and min_data_per_bin, you should set them both to 1 to run your test.

@JiaRu2016
Copy link
Author

@guolinke you are right. Explicitly set ds.feature_name = ds.get_feature_name() will solve this problem.

@StrikerRUS StrikerRUS mentioned this issue Oct 5, 2020
2 tasks
@StrikerRUS
Copy link
Collaborator

Fixed via #2754.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants