Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] ensure boosting happens in tests on small datasets #5121

Merged
merged 1 commit into from
Apr 4, 2022

Conversation

jameslamb
Copy link
Collaborator

The R package contains a few tests using the mtcars dataset that comes built into R.

That dataset only has 32 observations in it. From ?mtcars.

data(mtcars)
dim(mtcars)
# [1] 32 11

As a result, the calls to lightgbm() and lgb.train() in tests using that dataset are not currently performing any boosting, for reasons described in #5081.

If setting verbosity to something low enough to allow INFO and WARNING level logs, those tests contain logs like the following:

[LightGBM] [Warning] There are no meaningful features which satisfy the provided configuration. Decreasing Dataset parameters min_data_in_bin or min_data_in_leaf and re-constructing Dataset might resolve this warning.
[LightGBM] [Info] Total Bins 0
[LightGBM] [Info] Number of data points in the train set: 32, number of used features: 0
[LightGBM] [Info] Start training from score 20.090625
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements

This PR proposes setting min_data_in_bin = 1 and min_data_in_leaf = 1 in those examples, to ensure that boosting occurs. I believe this will improve the test coverage LightGBM gets from these tests, by ensuring that the tests use models that actually generate trees with splits.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@jameslamb jameslamb merged commit 3d620bf into master Apr 4, 2022
@jameslamb jameslamb deleted the r/fix-small-data-tests branch April 4, 2022 00:22
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants