-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] merge local_predict tests into other tests (fixes #3833) #3842
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your interest in LightGBM and your contribution! Can you please explain why you needed to add an additional _create_data()
call and an additional .fit()
in the classifier test? That should not be necessary.
X_1, y_1, w_1, dX_1, dy_1, dw_1 = _create_data( | ||
objective='classification', | ||
output='array' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this necessary? You should just use the dask_classifier
defined below this. With this change, you'd only be doing the local predict on arrays each time, but we want to test on all of DataFrame, Array, and sparse matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I tried to reference the prediction inside test_classifier
with p1_local=dask_classifier.to_local().predict(dX)
for p1
of test_classifier_local_predict
, but this line gave me two errors, both saying ValueError: Expected 2D array, got scalar array instead
while running the tests.
I thought it was because of the difference we have in the output
params in
X, y, w, dX, dy, dw = _create_data(
objective='classification',
output=output,
centers=centers
)
and
X, y, w, dX, dy, dw = _create_data(
objective='classification',
output='array'
)
in test_classifier
and test_classifier_local_predict
accordingly.
I also tried reshaping the array with
p1_local=dask_classifier.to_local().predict(dX.reshape(1, 1)
, p1_local=dask_classifier.to_local().predict(dX.reshape(1, -1)
, p1_local=dask_classifier.to_local().predict(dX.reshape(-1, 1)
each at a time and running the test but it cause other test cases to fail as well.
Is there any way to resolve it? Can you help me with it? @jameslamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask_classifier.to_local()
produces a lightgbm.sklearn.LGBMClassifier
. Pass it X
(a non-Dask data structure) instead of dX
(a Dask data structure) and it should work without needing to have this extra _create_data()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it worked. I also renamed the variables. you can tell me if I need any other changes?
p2_local_predict = dask_regressor.to_local().predict(X) | ||
s2_local_predict = dask_regressor.to_local().score(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p2_local_predict = dask_regressor.to_local().predict(X) | |
s2_local_predict = dask_regressor.to_local().score(X, y) | |
p2_local = dask_regressor.to_local().predict(X) | |
s2_local = dask_regressor.to_local().score(X, y) |
Please simplify these names a bit (and elsewhere in the PR) by removing the "_predict" suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few other minor requested changes. Once you do this, I think this will be good to merge.
Really appreciate you taking the time to contribute!
p2_local = dask_regressor.to_local().predict(X) | ||
s2_local = dask_regressor.to_local().score(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p2_local = dask_regressor.to_local().predict(X) | |
s2_local = dask_regressor.to_local().score(X, y) | |
p1_local = dask_regressor.to_local().predict(X) | |
s1_local = dask_regressor.to_local().score(X, y) |
Please call these p1_local
and p2_local
. We have a convention in these tests right now that p1
references predictions from a model trained with DaskLGBMRegressor
and the p2
references things from lightgbm.sklearn.LGBMRegressor
. I know that's confusing and we should probably change it, but for now this test should be consistent with the existing code.
@@ -142,9 +142,11 @@ def test_classifier(output, centers, client, listen_port): | |||
local_listen_port=listen_port, | |||
**params | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert this unrelated whitespace change
@@ -153,12 +155,13 @@ def test_classifier(output, centers, client, listen_port): | |||
p2 = local_classifier.predict(X) | |||
p2_proba = local_classifier.predict_proba(X) | |||
s2 = local_classifier.score(X, y) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert this unrelated whitespace change and add this line back
@@ -428,6 +382,7 @@ def test_ranker(output, client, listen_port, group): | |||
dask_ranker = dask_ranker.fit(dX, dy, sample_weight=dw, group=dg, client=client) | |||
rnkvec_dask = dask_ranker.predict(dX) | |||
rnkvec_dask = rnkvec_dask.compute() | |||
rnkvec_local_1 = dask_ranker.to_local().predict(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rnkvec_local_1 = dask_ranker.to_local().predict(X) | |
rnkvec_dask_local = dask_ranker.to_local().predict(X) |
Please change this name so it's clearer how it differs from rnkvec_local
|
||
# distributed and to-local scores should be the same. | ||
assert_eq(rnkvec_dask, rnkvec_local) | ||
assert_eq(rnkvec_dask, rnkvec_local_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq(rnkvec_dask, rnkvec_local_1) | |
assert_eq(rnkvec_dask, rnkvec_dask_local) |
@@ -300,10 +276,12 @@ def test_regressor(output, client, listen_port): | |||
# Scores should be the same | |||
if output != 'dataframe': | |||
assert_eq(s1, s2, atol=.01) | |||
assert_eq(s1, s2_local) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq(s1, s2_local) | |
assert_eq(s1, s1_local) |
|
||
# Predictions should be roughly the same | ||
assert_eq(y, p1, rtol=1., atol=100.) | ||
assert_eq(y, p2, rtol=1., atol=50.) | ||
assert_eq(p1, p2_local) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq(p1, p2_local) | |
assert_eq(p1, p1_local) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much! I just added a commit resolving conflicts with some recently-merged stuff on master
. I'll merge this once it builds.
We hope you'll contribute more to LightGBM in the future!
Thank you @jameslamb for having your patience with me. I got to learn alot form you just by working on this issue. I'll look forward to contribute more in future. :) |
@@ -276,12 +276,12 @@ def test_regressor(output, client, listen_port): | |||
# Scores should be the same | |||
if output != 'dataframe': | |||
assert_eq(s1, s2, atol=.01) | |||
assert_eq(s1, s2_local) | |||
assert_eq(s1, s1_local) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that one of the tests is failing. I checked the log, and this is the line that is giving an error. Is it because I didn't pass an atol
key argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think so. I can see in the logs that the differences are very small, maybe small enough to be due to precision differences
For example, from https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=8740&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf
a_original = 0.9555812336527378, b_original = 0.9555698440069504
Please add atol = 0.003
here for now. That will be enough to catch major errors. We have an open issue (#3835) to come back in the future and investigate how to close these gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I already resolved conflicts with master
on your branch. So you should pull this branch fresh before working on it.
git branch -D reduce-tests
git fetch origin reduce-tests
git checkout reduce-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the change. It should work now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thank you very much for your contribution
Thanks, @jameslamb for helping me out. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
…#3833