[dask] include multiclass-classification task in tests #4048

jmoralez · 2021-03-05T05:43:52Z

This includes an additional task of multiclass-classification for test_dask.py and renames classification to binary-classification. I removed the centers argument from the _create_data function and now define them inside the function accordingly to the classifying task. I also created task_to_dask_factory and task_to_local_factory dictionaries to simplify the task of getting a model factory from a task, this was specially useful now that we have two classification tasks.

I noticed that test_init_score fails for the multiclass-classification task but I removed it from the tasks in that test and can make a different PR adding it back once #4046 is solved. I also noticed that test_classifier sometimes fails when comparing the local vs dask probabilities, so I'm happy to address that if that's in the scope of this PR.

jameslamb

Thanks very much for doing this!

Please see my suggested changes. If you think it will be more than a few days until you have time to work on these suggestions, could you please submit a separate PR that only has the "put factory lookup in a dict" changes in it? Those changes don't require any review discussion since we'd already agreed to them, and a PR for those could be merged quickly. If you'll be able to get back to this PR in the next few days, I think it's totally fine to leave them together as a single pull request.

Thanks again for all your efforts on lightgbm.dask! Really really appreciate the time and energy you've put into moving this part of the project forward.

...test_classifier sometimes fails when comparing the local vs dask probabilities, so I'm happy to address that if that's in the scope of this PR.

The test failures look specific to this PR. All of the failures look like they're coming from multiclass classification tests on test_classifier() which are currently working on master. Given that, I think they should be fixed on this PR.

Linux sdist

FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-dataframe]
FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-dataframe-with-categorical]

Linux bdist

FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-array]
FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-scipy_csr_matrix]
FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-dataframe-with-categorical]

Linux_latest bdist

FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-array]
FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-dataframe]
FAILED ../tests/python_package_test/test_dask.py::test_classifier[multiclass-classification-dataframe-with-categorical]

I have one possible solution I'd like you to try. In the multiclass classification block of _create_data(), instead of centers = 3 please try passing the value currently being passed on master. Something like this

if objective == 'binary-classification':
    centers = [[-4, -4], [4, 4]]
elif objective == 'multiclass-classification':
    centers = [[-4, -4], [4, 4], [-4, 4]]

I know I originally suggested that just centers = 2 and centers = 3 would be sufficient, but these failing tests suggest that I was wrong to assume that. Maybe it's the case that the choice make_blobs() makes for centers=3 produces classes that are too similar to each other (based on feature values), and as a result maybe the small models we're training in tests here (n_estimators=10, num_leaves=10) are not able to effectively discriminate between classes. In general, for a fixed dataset we should expect distributed learning to take more iterations to achieve the same statistical performance as non-distributed learning.

jameslamb · 2021-03-07T00:50:50Z

tests/python_package_test/test_dask.py

@@ -1229,12 +1222,9 @@ def test_training_succeeds_when_data_is_dataframe_and_label_is_column_array(
    client.close(timeout=CLIENT_CLOSE_TIMEOUT)


-@pytest.mark.parametrize('task', tasks)
+@pytest.mark.parametrize('task', ['binary-classification', 'ranking', 'regression'])


I noticed that test_init_score fails for the multiclass-classification task but I removed it from the tasks in that test and can make a different PR adding it back once #4046 is solved.

This PR should include a test for init_score used for multiclass classification. I agree with you that the "pass a 1D array" interface could lead to mistakes in the multiclass classification case, but I don't consider #4046 a bug and I expect that a PR that addresses #4046 would only ADD the ability to pass an array of shape (n_samples, n_classes), not remove the current behavior of passing a 1D array. I expect that it will end up that way so that we don't break existing users' code.

Here's a minimal example showing that the current 1D-array behavior can work in the Dask interface.

from sklearn.datasets import make_blobs import lightgbm as lgb import numpy as np import dask.array as da from distributed import Client, LocalCluster cluster = LocalCluster(n_workers=2) client = Client(cluster) X, y = make_blobs(n_samples=1000, n_features=50, centers=3) dX = da.from_array(X, chunks=(100, 50)) dy = da.from_array(y, chunks=(100,)) init_scores = dy.map_blocks( lambda x: np.repeat(0.8, x.size * 3), dtype=np.float64 ) assert init_scores.npartitions == dy.npartitions dask_model = lgb.DaskLGBMClassifier() dask_model.fit(dX, dy, init_score=init_scores) assert dask_model.booster_.trees_to_dataframe()['value'][0] == 0.0

So for this test, can you please just change the init_score setup to something like this?

# init_scores must be a 1D array, even for multiclass classification # where you need to provide 1 score per class for each row in X # https://github.com/microsoft/LightGBM/issues/4046 size_factor = 1 if task == "multiclass-classification": size_factor = 3 if output.startswith('dataframe'): init_scores = dy.map_partitions( lambda x: pd_Series(np.repeat(init_score, x.size * size_factor)) ) else: init_scores = dy.map_blocks( lambda x: np.repeat(init_score, x.size * size_factor), dtype=np.float64 )

… for multiclass-classification

jmoralez · 2021-03-09T02:15:39Z

Hi James. Thank you for your comments, I've included them and things work ok locally. However in the CI all the dask tests are failing because of the python version (3.9) I believe.

jameslamb · 2021-03-09T02:38:11Z

Hi James. Thank you for your comments, I've included them and things work ok locally. However in the CI all the dask tests are failing because of the python version (3.9) I believe.

Thanks! In the future, when you mention failing tests it would be useful to include a link to logs and preferably a small snippet of the error message(s) or other logs you're seeing.

I can see from https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=9396&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf that a lot of the Dask tests are failing with an error like this:

E AttributeError: 'HighLevelGraph' object has no attribute 'dask_distributed_pack'

I'm very familiar with this problem actually! I opened dask/dask#7331 this weekend which I think describes it. It's possible to get this error if you have dask 2021.2.0 and distributed 2021.3.0.

I can see that that's what we ended up with when the testing conda env was solved. From the same logs:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ...
    dask-2021.2.0              |     pyhd3eb1b0_0           5 KB
    dask-core-2021.2.0         |     pyhd3eb1b0_0         643 KB
    ...
    distributed-2021.3.0       |   py39h06a4308_0         1.0 MB

I see the same failures on the builds for a totally unrelated PR (#4053): https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=9393&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf

I'll open a PR right now to try to fix this by forcing our testing conda env to use compatible versions of these two libraries.

jameslamb · 2021-03-10T01:04:37Z

Ok @jmoralez I think the known issues with this project's continuous integration stuff have been resolved. Could you please merge the most recent master into this?

jmoralez · 2021-03-10T02:41:20Z

Hi James. Those CI fixes look like they were quite an adventure haha. Something strange happened with the test for Linux sdist.

Truncated Logs

  Found link https://files.pythonhosted.org/packages/24/fb/8a56a46243514681e569bbafd8146fa383476c4b7c725c8598c452366f31/pip-6.1.0-py2.py3-none-any.whl#sha256=435a018f6d29e34d4f901bf4e6860d8a5fa1816b68d62008c18ca062a306db31 (from https://pypi.org/simple/pip/), version: 6.1.0
  Found link https://files.pythonhosted.org/packages/6c/84/432eb60bbcb414b9cdfcb135d5f4925e253c74e7d6916ada79990d6cc1a0/pip-6.1.0.tar.gz#sha256=89f120e2ab3d25ab70c36eb28ad4f280fc9ba71736e74d3055f609c1f9173768 (from https://pypi.org/simple/pip/), version: 6.1.0
  Found link https://files.pythonhosted.org/packages/67/f0/ba0fb41dbdbfc4aa3e0c16b40269aca6b9e3d59cacdb646218aa2e9b1d2c/pip-6.1.1-py2.py3-none-any.whl#sha256=a67e54aa0f26b6d62ccec5cc6735eff205dd0fed075f56ac3d3111e91e4467fc (from https://pypi.org/simple/pip/), version: 6.1.1
  Found link https://files.pythonhosted.org/packages/bf/85/871c126b50b8ee0b9819e8a63b614aedd264577e73478caedcd447e8f28c/pip-6.1.1.tar.gz#sha256=89f3b626d225e08e7f20d85044afa40f612eb3284484169813dc2d0631f2a556 (from https://pypi.org/simple/pip/), version: 6.1.1
  Found link https://files.pythonhosted.org/packages/5a/9b/56d3c18d0784d5f2bbd446ea2dc7ffa7476c35e3dc223741d20cfee3b185/pip-7.0.0-py2.py3-none-any.whl#sha256=309c48399c7d68501a10ef206abd6e5c541fedbf84b95435d9063bd454b39df7 (from https://pypi.org/simple/pip/), version: 7.0.0
  Found link https://files.pythonhosted.org/packages/c6/16/6475b142927ca5d03e3b7968efa5b0edd103e4684ecfde181a25f6fa2505/pip-7.0.0.tar.gz#sha256=7b46bfc1b95494731de306a688e2a7bc056d7fa7ad27e026908fb2ae67fed23d (from https://pypi.org/simple/pip/), version: 7.0.0
  Found link https://files.pythonhosted.org/packages/5a/10/bb7a32c335bceba636aa673a4c977effa1e73a79f88856459486d8d670cf/pip-7.0.1-py2.py3-none-any.whl#sha256=d26b8573ba1ac1ec99a9bdbdffee2ff2b06c7790815211d0eb4dc1462a089705 (from https://pypi.org/simple/pip/), version: 7.0.1
  Found link https://files.pythonhosted.org/packages/4a/83/9ae4362a80739657e0c8bb628ea3fa0214a9aba7c8590dacc301ea293f73/pip-7.0.1.tar.gz#sha256=cfec177552fdd0b2d12b72651c8e874f955b4c62c1c2c9f2588cbdc1c0d0d416 (from https://pypi.org/simple/pip/), version: 7.0.1
  Found link https://files.pythonhosted.org/packages/64/7f/7107800ae0919a80afbf1ecba21b90890431c3ee79d700adac3c79cb6497/pip-7.0.2-py2.py3-none-any.whl#sha256=83c869c5ab7113866e2d69641ec470d47f0faae68ca4550a289a4d3db515ad65 (from https://pypi.org/simple/pip/), version: 7.0.2
  Found link https://files.pythonhosted.org/packages/75/b1/66532c273bca0133e42c3b4540a1609289f16e3046f1830f18c60794d661/pip-7.0.2.tar.gz#sha256=ba28fa60b573a9444e7b78ccb3b0f261d1f66f46d20403f9dce37b18a6aed405 (from https://pypi.org/simple/pip/), version: 7.0.2
  Found link https://files.pythonhosted.org/packages/96/76/33a598ae42dd0554207d83c7acc60e3b166dbde723cbf282f1f73b7a127c/pip-7.0.3-py2.py3-none-any.whl#sha256=7b1cb03e827d58d2d05e68ea96a9e27487ed4b0afcd951ac6e40847ce94f0738 (from https://pypi.org/simple/pip/), version: 7.0.3
  Found link https://files.pythonhosted.org/packages/35/59/5b23115758ba0f2fc465c459611865173ef006202ba83f662d1f58ed2fb8/pip-7.0.3.tar.gz#sha256=b4c598825a6f6dc2cac65968feb28e6be6c1f7f1408493c60a07eaa731a0affd (from https://pypi.org/simple/pip/), version: 7.0.3
  Found link https://files.pythonhosted.org/packages/f7/c0/9f8dac88326609b4b12b304e8382f64f7d5af7735a00d2fac36cf135fc30/pip-7.1.0-py2.py3-none-any.whl#sha256=80c29f899d3a00a448d65f8158544d22935baec7159af8da1a4fa1490ced481d (from https://pypi.org/simple/pip/), version: 7.1.0
  Found link https://files.pythonhosted.org/packages/7e/71/3c6ece07a9a885650aa6607b0ebfdf6fc9a3ef8691c44b5e724e4eee7bf2/pip-7.1.0.tar.gz#sha256=d5275ba3221182a5dd1b6bcfbfc5ec277fb399dd23226d6fa018048f7e0f10f2 (from https://pypi.org/simple/pip/), version: 7.1.0
  Found link https://files.pythonhosted.org/packages/1c/56/094d563c508917081bccff365e4f621ba33073c1c13aca9267a43cfcaf13/pip-7.1.1-py2.py3-none-any.whl#sha256=ce13000878d34c1178af76cb8cf269e232c00508c78ed46c165dd5b0881615f4 (from https://pypi.org/simple/pip/), version: 7.1.1
  Found link https://files.pythonhosted.org/packages/3b/bb/b3f2a95494fd3f01d3b3ae530e7c0e910dc25e88e30787b0a5e10cbc0640/pip-7.1.1.tar.gz#sha256=b22fe3c93a13fc7c04f145a42fd2ad50a9e3e1b8a7eed2e2b1c66e540a0951da (from https://pypi.org/simple/pip/), version: 7.1.1
  Found link https://files.pythonhosted.org/packages/b2/d0/cd115fe345dd6f07ec1c780020a7dfe74966fceeb171e0f20d1d4905b0b7/pip-7.1.2-py2.py3-none-any.whl#sha256=b9d3983b5cce04f842175e30169d2f869ef12c3546fd274083a65eada4e9708c (from https://pypi.org/simple/pip/), version: 7.1.2
  Found link https://files.pythonhosted.org/packages/d0/92/1e8406c15d9372084a5bf79d96da3a0acc4e7fcf0b80020a4820897d2a5c/pip-7.1.2.tar.gz#sha256=ca047986f0528cfa975a14fb9f7f106271d4e0c3fe1ddced6c1db2e7ae57a477 (from https://pypi.org/simple/pip/), version: 7.1.2
  Found link https://files.pythonhosted.org/packages/00/ae/bddef02881ee09c6a01a0d6541aa6c75a226a4e68b041be93142befa0cd6/pip-8.0.0-py2.py3-none-any.whl#sha256=262ed1823eb7fbe3f18a9bedb4800e59c4ab9a6682aff8c37b5ee83ea840910b (from https://pypi.org/simple/pip/), version: 8.0.0
  Found link https://files.pythonhosted.org/packages/e3/2d/03c014d11e66628abf2fda5ca00f779cbe7b5292c5cd13d42a95b94aa9b8/pip-8.0.0.tar.gz#sha256=90112b296152f270cb8dddcd19b7b87488d9e002e8cf622e14c4da9c2f6319b1 (from https://pypi.org/simple/pip/), version: 8.0.0
  Found link https://files.pythonhosted.org/packages/45/9c/6f9a24917c860873e2ce7bd95b8f79897524353df51d5d920cd6b6c1ec33/pip-8.0.1-py2.py3-none-any.whl#sha256=dedaac846bc74e38a3253671f51a056331ffca1da70e3f48d8128f2aa0635bba (from https://pypi.org/simple/pip/), version: 8.0.1
  Found link https://files.pythonhosted.org/packages/ea/66/a3d6187bd307159fedf8575c0d9ee2294d13b1cdd11673ca812e6a2dda8f/pip-8.0.1.tar.gz#sha256=477c50b3e538a7ac0fa611fb8b877b04b33fb70d325b12a81b9dbf3eb1158a4d (from https://pypi.org/simple/pip/), version: 8.0.1
  Found link https://files.pythonhosted.org/packages/e7/a0/bd35f5f978a5e925953ce02fa0f078a232f0f10fcbe543d8cfc043f74fda/pip-8.0.2-py2.py3-none-any.whl#sha256=249a6f3194be8c2e8cb4d4be3f6fd16a9f1e3336218caffa8e7419e3816f9988 (from https://pypi.org/simple/pip/), version: 8.0.2
  Found link https://files.pythonhosted.org/packages/ce/15/ee1f9a84365423e9ef03d0f9ed0eba2fb00ac1fffdd33e7b52aea914d0f8/pip-8.0.2.tar.gz#sha256=46f4bd0d8dfd51125a554568d646fe4200a3c2c6c36b9f2d06d2212148439521 (from https://pypi.org/simple/pip/), version: 8.0.2
  Found link https://files.pythonhosted.org/packages/ae/d4/2b127310f5364610b74c28e2e6a40bc19e2d3c9a9a4e012d3e333e767c99/pip-8.0.3-py2.py3-none-any.whl#sha256=b0335bc837f9edb5aad03bd43d0973b084a1cbe616f8188dc23ba13234dbd552 (from https://pypi.org/simple/pip/), version: 8.0.3
  Found link https://files.pythonhosted.org/packages/22/f3/14bc87a4f6b5ec70b682765978a6f3105bf05b6781fa97e04d30138bd264/pip-8.0.3.tar.gz#sha256=30f98b66f3fe1069c529a491597d34a1c224a68640c82caf2ade5f88aa1405e8 (from https://pypi.org/simple/pip/), version: 8.0.3
  Found link https://files.pythonhosted.org/packages/1e/c7/78440b3fb882ed001e6e12d8770bd45e73d6eced4e57f7c072b829ce8a3d/pip-8.1.0-py2.py3-none-any.whl#sha256=a542b99e08002ead83200198e19a3983270357e1cb4fe704247990b5b35471dc (from https://pypi.org/simple/pip/), version: 8.1.0
  Found link https://files.pythonhosted.org/packages/3c/72/6981d5adf880adecb066a1a1a4c312a17f8d787a3b85446967964ac66d55/pip-8.1.0.tar.gz#sha256=d8faa75dd7d0737b16d50cd0a56dc91a631c79ecfd8d38b80f6ee929ec82043e (from https://pypi.org/simple/pip/), version: 8.1.0
  Found link https://files.pythonhosted.org/packages/31/6a/0f19a7edef6c8e5065f4346137cc2a08e22e141942d66af2e1e72d851462/pip-8.1.1-py2.py3-none-any.whl#sha256=44b9c342782ab905c042c207d995aa069edc02621ddbdc2b9f25954a0fdac25c (from https://pypi.org/simple/pip/), version: 8.1.1
  Found link https://files.pythonhosted.org/packages/41/27/9a8d24e1b55bd8c85e4d022da2922cb206f183e2d18fee4e320c9547e751/pip-8.1.1.tar.gz#sha256=3e78d3066aaeb633d185a57afdccf700aa2e660436b4af618bcb6ff0fa511798 (from https://pypi.org/simple/pip/), version: 8.1.1
  Found link https://files.pythonhosted.org/packages/9c/32/004ce0852e0a127f07f358b715015763273799bd798956fa930814b60f39/pip-8.1.2-py2.py3-none-any.whl#sha256=6464dd9809fb34fc8df2bf49553bb11dac4c13d2ffa7a4f8038ad86a4ccb92a1 (from https://pypi.org/simple/pip/), version: 8.1.2
  Found link https://files.pythonhosted.org/packages/e7/a8/7556133689add8d1a54c0b14aeff0acb03c64707ce100ecd53934da1aa13/pip-8.1.2.tar.gz#sha256=4d24b03ffa67638a3fa931c09fd9e0273ffa904e95ebebe7d4b1a54c93d7b732 (from https://pypi.org/simple/pip/), version: 8.1.2
  Found link https://files.pythonhosted.org/packages/3f/ef/935d9296acc4f48d1791ee56a73781271dce9712b059b475d3f5fa78487b/pip-9.0.0-py2.py3-none-any.whl#sha256=c856ac18ca01e7127456f831926dc67cc7d3ab663f4c13b1ec156e36db4de574 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.0
  Found link https://files.pythonhosted.org/packages/5e/53/eaef47e5e2f75677c9de0737acc84b659b78a71c4086f424f55346a341b5/pip-9.0.0.tar.gz#sha256=f62fb70e7e000e46fce12aaeca752e5281a5446977fe5a75ab4189a43b3f8793 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.0
  Found link https://files.pythonhosted.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#sha256=690b762c0a8460c303c089d5d0be034fb15a5ea2b75bdf565f40421f542fefb0 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.1
  Found link https://files.pythonhosted.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#sha256=09f243e1a7b461f654c26a725fa373211bb7ff17a9300058b205c61658ca940d (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.1
  Found link https://files.pythonhosted.org/packages/e7/f9/e801dcea22886cd513f6bd2e8f7e581bd6f67bb8e8f1cd8e7b92d8539280/pip-9.0.2-py2.py3-none-any.whl#sha256=b135491ddb061f39719b8472d8abb59c613816a2b86069c332db74d1cd208ab2 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.2
  Found link https://files.pythonhosted.org/packages/e5/8f/3fc66461992dc9e9fcf5e005687d5f676729172dda640df2fd8b597a6da7/pip-9.0.2.tar.gz#sha256=88110a224e9d30e5d76592a0b2130ef10e7e67a6426e8617bb918fffbfe91fe5 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.2
  Found link https://files.pythonhosted.org/packages/ac/95/a05b56bb975efa78d3557efa36acaf9cf5d2fd0ee0062060493687432e03/pip-9.0.3-py2.py3-none-any.whl#sha256=c3ede34530e0e0b2381e7363aded78e0c33291654937e7373032fda04e8803e5 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.3
  Found link https://files.pythonhosted.org/packages/c4/44/e6b8056b6c8f2bfd1445cc9990f478930d8e3459e9dbf5b8e2d2922d64d3/pip-9.0.3.tar.gz#sha256=7bf48f9a693be1d58f49f7af7e0ae9fe29fd671cde8a55e6edca3581c4ef5796 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.3
  Found link https://files.pythonhosted.org/packages/4b/5a/8544ae02a5bd28464e03af045e8aabde20a7b02db1911a9159328e1eb25a/pip-10.0.0b1-py2.py3-none-any.whl#sha256=dbd5d24cd461be23429625085a36cc8732cbcac4d2aaf673031f80f6ac07d844 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0b1
  Found link https://files.pythonhosted.org/packages/aa/6d/ffbb86abf18b750fb26f27eda7c7732df2aacaa669c420d2eb2ad6df3458/pip-10.0.0b1.tar.gz#sha256=8d6e63d8b99752e4b53f272b66f9cd7b59e2b288e9a863a61c48d167203a2656 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0b1
../tests/python_package_test/test_basic.py .........................     [  6%]
../tests/python_package_test/test_consistency.py ......                  [  7%]
../tests/python_package_test/test_dask.py .............................. [ 15%]
...................................................s...............s.

jameslamb · 2021-03-10T03:34:21Z

Hi James. Those CI fixes look like they were quite an adventure haha.

Ha yes, every day in CI world brings new weird stuff to figure out. I'm not sure what happened with that Azure DevOps job you linked to...from those logs, it looks like maybe pip got caught in a very very very expensive dependency resolution and couldn't get out of it. The new pip resolver is still working out some issues and this can happen occasionally, I think.

I just restarted that job manually, hopefully it will pass this time. The failure definitely seems unrelated to your changes in this PR.

jameslamb · 2021-03-10T03:36:57Z

I just changed the PR title, adding the phrase "in tests". We use a bot called release-drafter (https://github.com/microsoft/LightGBM/blob/master/.github/release-drafter.yml) to automatically create changelogs for releases, so each PR title becomes one point in the changelog. I want it to be clear in the changelog that this PR added tests on multiclass classification tasks, not a new feature in the Dask package.

Just explaining since I know if someone changed the title of one of my PRs, I'd want to know why.

jmoralez · 2021-03-10T03:44:14Z

Haha it's fine. I had noticed the title missed specifying the fact that it was for tests but forgot to update it so thank you. I'll make sure to have well defined pr titles in the future having this changelog thing in mind.

jameslamb

awesome contribution, thanks so much for increasing the test coverage of the Dask package and simplifying the test code!

* [docs]Add alt text on images * Update docs/GPU-Windows.rst Co-authored-by: James Lamb <[email protected]> * Update docs/GPU-Windows.rst Co-authored-by: James Lamb <[email protected]> * Apply suggestions from code review Co-authored-by: James Lamb <[email protected]> * Apply suggestions from code review Co-authored-by: James Lamb <[email protected]> * Merge main branch commit updates (#1) * [docs] Add alt text to image in Parameters-Tuning.rst (#4035) * [docs] Add alt text to image in Parameters-Tuning.rst Add alt text to Leaf-wise growth image, as part of #4028 * Update docs/Parameters-Tuning.rst Co-authored-by: James Lamb <[email protected]> Co-authored-by: James Lamb <[email protected]> * [ci] [R-package] upgrade to R 4.0.4 in CI (#4042) * [docs] update description of deterministic parameter (#4027) * update description of deterministic parameter to require using with force_row_wise or force_col_wise * Update include/LightGBM/config.h Co-authored-by: Nikita Titov <[email protected]> * update docs Co-authored-by: Nikita Titov <[email protected]> * [dask] Include support for init_score (#3950) * include support for init_score * use dataframe from init_score and test difference with and without init_score in local model * revert refactoring * initial docs. test between distributed models with and without init_score * remove ranker from tests * test value for root node and change docs * comma * re-include parametrize * fix incorrect merge * use single init_score and the booster_ attribute * use np.float64 instead of float * [ci] ignore untitle Jupyter notebooks in .gitignore (#4047) * [ci] prevent getting incompatible dask and distributed versions (#4054) * [ci] prevent getting incompatible dask and distributed versions * Update .ci/test.sh Co-authored-by: Nikita Titov <[email protected]> * empty commit Co-authored-by: Nikita Titov <[email protected]> * [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055) * [ci] fix R CMD CHECK note about example timings (fixes #4049) * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * empty commit Co-authored-by: Nikita Titov <[email protected]> * [ci] add CMake + R 3.6 test back (fixes #3469) (#4053) * [ci] add CMake + R 3.6 test back (fixes #3469) * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * Update .ci/test_r_package_windows.ps1 * -Wait and remove rtools40 * empty commit Co-authored-by: Nikita Titov <[email protected]> * [dask] include multiclass-classification task in tests (#4048) * include multiclass-classification task and task_to_model_factory dicts * define centers coordinates. flatten init_scores within each partition for multiclass-classification * include issue comment and fix linting error * Update index.rst (#4029) Add alt text to logo image Co-authored-by: James Lamb <[email protected]> * [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059) * [dask] raise more informative error for duplicates in 'machines' * uncomment * avoid test failure * Revert "avoid test failure" This reverts commit 9442bdf. * [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030) * [dask] add tutorial documentation (fixes #3814, fixes #3838) * add notes on saving the model * quick start examples * add examples * fix timeouts in examples * remove notebook * fill out prediction section * table of contents * add line back * linting * isort * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * move examples under python-guide * remove unused pickle import Co-authored-by: Nikita Titov <[email protected]> * set 'pending' commit status for R Solaris optional workflow (#4061) * [docs] add Yu Shi to repo maintainers (#4060) * Update FAQ.rst * Update CODEOWNERS * set is_linear_ to false when it is absent from the model file (fix #3778) (#4056) * Add CMake option to enable sanitizers and build gtest (#3555) * Add CMake option to enable sanitizer * Set up gtest * Address reviewer's feedback * Address reviewer's feedback * Update CMakeLists.txt Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: Nikita Titov <[email protected]> * added type hint (#4070) * [ci] run Dask examples on CI (#4064) * Update Parallel-Learning-Guide.rst * Update test.sh * fix path * address review comments * [python-package] add type hints on Booster.set_network() (#4068) * [python-package] add type hints on Booster.set_network() * change behavior * [python-package] Some mypy fixes (#3916) * Some mypy fixes * address James' comments * Re-introduce pass in empty classes * Update compat.py Remove extra lines * [dask] [ci] fix flaky network-setup test (#4071) * [tests][dask] simplify code in Dask tests (#4075) * simplify Dask tests code * enable CI * disable CI * Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076) This reverts commit 4e9c976. * Fix parsing of non-finite values (#3942) * Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix parsing of non-finite values. Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match. No attempt to optimise string allocations in this implementation since it is usually rarely invoked. * Dummy commit to trigger CI * Also handle -nan in double parsing method * Update include/LightGBM/utils/common.h Remove trailing whitespace to pass linting tests Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: matthew-peacock <[email protected]> Co-authored-by: Guolin Ke <[email protected]> Co-authored-by: Nikita Titov <[email protected]> * [dask] remove unused imports from typing (#4079) * Range check for DCG position discount lookup (#4069) * Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data. * Change debug logging location so that we can print the data file name as well. * Revert "Change debug logging location so that we can print the data file name as well." This reverts commit 3981b34. * Add data file name to debug logging. * Move log line to a place where it is output even when query IDs are read from a separate file. * Also add the out-of-range check to rank metrics. * Perform check after number of queries is initialized. * Update * [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084) * [ci] install additional LaTeX packages in R CI jobs * update autoconf version * bump upper limit on package size to 100 * [SWIG] Add streaming data support + cpp tests (#3997) * [feature] Add ChunkedArray to SWIG * Add ChunkedArray * Add ChunkedArray_API_extensions.i * Add SWIG class wrappers * Address some review comments * Fix linting issues * Move test to tests/test_ChunkedArray_manually.cpp * Add test note * Move ChunkedArray to include/LightGBM/utils/ * Declare more explicit types of ChunkedArray in the SWIG API. * Port ChunkedArray tests to googletest * Please C++ linter * Address StrikerRUS' review comments * Update SWIG doc & disable ChunkedArray<int64_t> * Use CHECK_EQ instead of assert * Change include order (linting) * Rename ChunkedArray -> chunked_array files * Change header guards * Address last comments from StrikerRUS * store all CMake files in one place (#4087) * v3.2.0 release (#3872) * Update VERSION.txt * update appveyor.yml and configure * fix Appveyor builds Co-authored-by: James Lamb <[email protected]> Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: StrikerRUS <[email protected]> * [ci] Bump version for development (#4094) * Update .appveyor.yml * Update cran-comments.md * Update VERSION.txt * update configure Co-authored-by: James Lamb <[email protected]> * [ci] fix flaky Azure Pipelines jobs (#4095) * Update test.sh * Update setup.sh * Update .vsts-ci.yml * Update test.sh * Update setup.sh * Update .vsts-ci.yml * Update setup.sh * Update setup.sh Co-authored-by: Subham Agrawal <[email protected]> Co-authored-by: James Lamb <[email protected]> Co-authored-by: shiyu1994 <[email protected]> Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: jmoralez <[email protected]> Co-authored-by: marcelonieva7 <[email protected]> Co-authored-by: Philip Hyunsu Cho <[email protected]> Co-authored-by: Deddy Jobson <[email protected]> Co-authored-by: Alberto Ferreira <[email protected]> Co-authored-by: mjmckp <[email protected]> Co-authored-by: matthew-peacock <[email protected]> Co-authored-by: Guolin Ke <[email protected]> Co-authored-by: ashok-ponnuswami-msft <[email protected]> Co-authored-by: StrikerRUS <[email protected]> * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: James Lamb <[email protected]> Co-authored-by: Subham Agrawal <[email protected]> Co-authored-by: shiyu1994 <[email protected]> Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: jmoralez <[email protected]> Co-authored-by: marcelonieva7 <[email protected]> Co-authored-by: Philip Hyunsu Cho <[email protected]> Co-authored-by: Deddy Jobson <[email protected]> Co-authored-by: Alberto Ferreira <[email protected]> Co-authored-by: mjmckp <[email protected]> Co-authored-by: matthew-peacock <[email protected]> Co-authored-by: Guolin Ke <[email protected]> Co-authored-by: ashok-ponnuswami-msft <[email protected]> Co-authored-by: StrikerRUS <[email protected]>

github-actions · 2023-08-23T23:59:03Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

include multiclass-classification task and task_to_model_factory dicts

77397b6

jmoralez requested a review from jameslamb as a code owner March 5, 2021 05:43

jameslamb added the maintenance label Mar 5, 2021

jameslamb requested changes Mar 7, 2021

View reviewed changes

jmoralez added 2 commits March 8, 2021 19:44

define centers coordinates. flatten init_scores within each partition…

5d6b291

… for multiclass-classification

include issue comment and fix linting error

7bdcd23

This was referenced Mar 9, 2021

map_blocks: "MaterializedLayer' object has no attribute 'pack_annotations'" dask/dask#7331

Closed

[ci] prevent getting incompatible dask and distributed versions #4054

Merged

merge master

ec237ca

jameslamb changed the title ~~[dask] include multiclass-classification task~~ [dask] include multiclass-classification task in tests Mar 10, 2021

jameslamb self-requested a review March 10, 2021 03:50

jameslamb approved these changes Mar 10, 2021

View reviewed changes

jameslamb merged commit 1d7b54d into microsoft:master Mar 10, 2021

jmoralez deleted the multiclass-task branch March 10, 2021 04:22

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] include multiclass-classification task in tests #4048

[dask] include multiclass-classification task in tests #4048

jmoralez commented Mar 5, 2021

jameslamb left a comment

jameslamb Mar 7, 2021

jmoralez commented Mar 9, 2021

jameslamb commented Mar 9, 2021

jameslamb commented Mar 10, 2021

jmoralez commented Mar 10, 2021

jameslamb commented Mar 10, 2021

jameslamb commented Mar 10, 2021

jmoralez commented Mar 10, 2021

jameslamb left a comment

github-actions bot commented Aug 23, 2023

[dask] include multiclass-classification task in tests #4048

[dask] include multiclass-classification task in tests #4048

Conversation

jmoralez commented Mar 5, 2021

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb Mar 7, 2021

Choose a reason for hiding this comment

jmoralez commented Mar 9, 2021

jameslamb commented Mar 9, 2021

jameslamb commented Mar 10, 2021

jmoralez commented Mar 10, 2021

jameslamb commented Mar 10, 2021

jameslamb commented Mar 10, 2021

jmoralez commented Mar 10, 2021

jameslamb left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023