[dask] reduce test times #3786

jameslamb · 2021-01-18T20:37:45Z

This PR proposes a refactoring of tests/python_package_test/test_dask.py, to reduce the runtime of the Dask tests.

makes models smaller by setting n_estimators=10 and num_leaves=10 (the defaults are n_estimators=100 and num_leaves=31)
adds the predict_proba() tests for DaskLGBMClassifier into the existing test_classifier test. I don't think effectively testing predict() and predict_proba() requires separate training runs

Impact of this change

I tried running pytest tests/python_package_test/test_dask.py 5 times on master and 5 times on this branch (on my laptop). I found that the changes in this PR reduce the runtime of the Dask tests by around 35 seconds.

I think this speedup will have a meaningful impact on development speed for the Dask module.

before (avg = 137s):

============================ 29 passed, 2 warnings in 150.19s (0:02:30) =============================
============================= 29 passed, 1 warning in 140.21s (0:02:20) =============================
============================ 29 passed, 2 warnings in 132.08s (0:02:12) =============================
============================ 29 passed, 2 warnings in 132.68s (0:02:12) =============================
============================ 29 passed, 2 warnings in 133.04s (0:02:13) =============================

after combining plus less iterations (avg = 102s):

============================= 23 passed, 1 warning in 101.96s (0:01:41) =============================
============================= 23 passed, 1 warning in 103.60s (0:01:43) =============================
============================= 23 passed, 1 warning in 101.74s (0:01:41) =============================
============================= 23 passed, 1 warning in 101.34s (0:01:41) =============================
============================= 23 passed, 1 warning in 102.70s (0:01:42) =============================

Notes for Reviewers

I intentionally did not change n_estimators in the test_regression test. That test requires a lot of iterations from distributed and local training to look similar, and I found that they produced very different results for small values of n_estimators.
If this PR is accepted, I'll add a review comment with similar suggestions on [python-package] [dask] Add DaskLGBMRanker #3708

StrikerRUS

LGTM! Thanks! I find this useful given the growing number of CI jobs and services.

github-actions · 2023-08-24T02:18:44Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added 2 commits January 18, 2021 12:47

speed up tests

43725c4

[dask] reduce test times

f6592ce

jameslamb added the maintenance label Jan 18, 2021

jameslamb requested a review from StrikerRUS January 18, 2021 20:37

StrikerRUS approved these changes Jan 18, 2021

View reviewed changes

jameslamb merged commit c871496 into microsoft:master Jan 18, 2021

jameslamb deleted the ci/dask-tests branch January 18, 2021 23:55

jameslamb mentioned this pull request Jan 18, 2021

[python-package] [dask] Add DaskLGBMRanker #3708

Merged

StrikerRUS mentioned this pull request Jan 19, 2021

[dask] Support pred_contrib in Dask predict() methods (fixes #3713) #3774

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] reduce test times #3786

[dask] reduce test times #3786

jameslamb commented Jan 18, 2021

StrikerRUS left a comment

github-actions bot commented Aug 24, 2023

[dask] reduce test times #3786

[dask] reduce test times #3786

Conversation

jameslamb commented Jan 18, 2021

Impact of this change

Notes for Reviewers

StrikerRUS left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 24, 2023