[python-package] Correctly recognize LGBMClassifier(num_class=2, objective="multiclass") as multiclass classification #6524

RektPunk · 2024-07-06T11:46:50Z

I created a pull request addressing issue #6519, which I've also encountered.
I modified the code to use np.vstack only when result.shape look like (num_data,), and some typos.

jameslamb

thanks, could you please add a unit test covering this behavior, to be sure it isn't broken by future refactorings?

RektPunk · 2024-07-07T01:19:07Z

@microsoft-github-policy-service agree

jameslamb

Thanks very much.

#6519 is reporting the same root cause as #3636, and the solution you've arrived at here looks very similar to what @jmoralez recommended there (#3636 (comment)). There he recommended using result.ndim instead of len(result.shape)... I don't have a strong opinion on one of those being preferable to another. I think the len(result.shape) approach you've taken is fine.

Re-reading #3636 though, I think there is one other place where self._n_classes > 2 is used to decide whether binary of multiclass classification is being performed, and which needs to be changed.

@rabyj mentioned something related to metrics at #3636 (comment), which I think is about this behavior where LGBMClassifier rewrites binary classification metrics to multiclass metrics and vice-versa:

LightGBM/python-package/lightgbm/sklearn.py

Line 1254 in a5054f7

if self._n_classes > 2:

This is making me think that maybe LGBMClassifier should stop relying on self._n_classes > 2 as a proxy for "is doing multiclass classification".

What do you think about adding a private property on LGBMClassifier that has something like this?

@property
def __is_multiclass(self) -> bool:
   multiclass_objectives = {"multiclass", "softmax", "multiclassova", "multiclass_ova", "ova", "ovr"} 
   return (
        self._n_classes > 2
        or isinstance(self._objective, str) and self._objective in multiclass_objectives
    )

And then using that in place of these self._n_classes > 2 conditions, where appropriate, e.g.

if self.__is_multiclass:
   # do multiclass classification things
else:
   # do binary classification things

If you agree, then please also add a case within this test checking that the behavior is correct when using num_class=2, objective="multiclass":

LightGBM/tests/python_package_test/test_sklearn.py

Line 781 in a5054f7

def test_metrics():

Sorry this is a lot... there is a lot of indirection in the scikit-learn estimator, some of it imposed by scikit-learn itself, so seemingly simple things can become complicated fast 😅

Other references you might find relevant:

tests/python_package_test/test_sklearn.py

RektPunk · 2024-07-07T09:10:21Z

Thanks for review, @jameslamb. As you mentioned, I added __is_multiclass and also added tests for the eval metric (if the test you had in mind is correct 😅 )

jameslamb

New tests look great, thanks very much for your attention to detail there making them minimal and similar to the other existing tests! And for fixing typos in comments, much appreciated.

I just have a few more small suggestions.

python-package/lightgbm/sklearn.py

tests/python_package_test/test_sklearn.py

jameslamb

Thanks very much! The changes and tests look great to me.

I left one more very very very minor suggestion.

Don't worry about the failing CI job (https://github.com/microsoft/LightGBM/actions/runs/9852647887/job/27218413942?pr=6524). I think that's the result of some recent change in the numpy or pandas nightlies, not anything done in this PR.

@jmoralez could you also review? I'm not 100% confident I've thought of all the implications of this change.

python-package/lightgbm/sklearn.py

jameslamb · 2024-07-16T21:04:56Z

Thanks very much for the work @RektPunk and for the help reviewing @jmoralez !

Pretty fun when one merge closes 2 issues 😁

RektPunk added 2 commits July 6, 2024 20:01

fix typo

6bd9595

add result shape condition

6560601

RektPunk requested review from guolinke, jameslamb, shiyu1994, jmoralez and borchero as code owners July 6, 2024 11:46

RektPunk changed the title ~~[python-package] - Fix multiclass binary classification~~ [python-package] Fix multiclass binary classification Jul 6, 2024

jameslamb requested changes Jul 6, 2024

View reviewed changes

jameslamb added the fix label Jul 6, 2024

add unittest

bf62c91

jameslamb mentioned this pull request Jul 7, 2024

[python-package] ValueError when using scikit-learn API for multiclass binary classification #6519

Closed

jameslamb self-requested a review July 7, 2024 02:58

add shape assert

d560980

jameslamb requested changes Jul 7, 2024

View reviewed changes

tests/python_package_test/test_sklearn.py Show resolved Hide resolved

tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved

jameslamb changed the title ~~[python-package] Fix multiclass binary classification~~ [python-package] Correctly recognize LGBMClassifier(num_class=2, objective="multiclass") as multiclass classification Jul 7, 2024

remove randomness and add test for predict method

66b372a

RektPunk marked this pull request as draft July 7, 2024 04:22

RektPunk added 2 commits July 7, 2024 15:52

add __is_multiclass property

f75de8c

add eval metric test

14159a6

RektPunk marked this pull request as ready for review July 7, 2024 09:03

RektPunk requested a review from jameslamb July 7, 2024 09:06

jameslamb requested changes Jul 8, 2024

View reviewed changes

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved

RektPunk added 3 commits July 8, 2024 12:04

fix module-level set constant

6f1da7f

fix test

bbbd5ba

fix typo in test

57e509c

RektPunk requested a review from jameslamb July 8, 2024 03:14

change concat to concatenate

aadf1fb

jameslamb and others added 2 commits July 8, 2024 23:50

Merge branch 'master' into fix-issue-6519

4a2ac5e

change the test to produce clear results

07e26bf

jameslamb approved these changes Jul 9, 2024

View reviewed changes

python-package/lightgbm/sklearn.py Show resolved Hide resolved

Merge branch 'master' into fix-issue-6519

67446c5

jameslamb added the awaiting review label Jul 11, 2024

Merge branch 'master' into fix-issue-6519

1d7b439

jameslamb mentioned this pull request Jul 12, 2024

release v4.5.0 #6538

Merged

27 tasks

jmoralez approved these changes Jul 15, 2024

View reviewed changes

jameslamb removed the awaiting review label Jul 16, 2024

jameslamb merged commit f8ec57b into microsoft:master Jul 16, 2024
41 checks passed

RektPunk deleted the fix-issue-6519 branch July 16, 2024 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Correctly recognize LGBMClassifier(num_class=2, objective="multiclass") as multiclass classification #6524

[python-package] Correctly recognize LGBMClassifier(num_class=2, objective="multiclass") as multiclass classification #6524

RektPunk commented Jul 6, 2024 •

edited by jameslamb

Loading

jameslamb left a comment

RektPunk commented Jul 7, 2024

jameslamb left a comment

RektPunk commented Jul 7, 2024

jameslamb left a comment

jameslamb left a comment

jameslamb commented Jul 16, 2024 •

edited

Loading

[python-package] Correctly recognize LGBMClassifier(num_class=2, objective="multiclass") as multiclass classification #6524

[python-package] Correctly recognize LGBMClassifier(num_class=2, objective="multiclass") as multiclass classification #6524

Conversation

RektPunk commented Jul 6, 2024 • edited by jameslamb Loading

jameslamb left a comment

Choose a reason for hiding this comment

RektPunk commented Jul 7, 2024

jameslamb left a comment

Choose a reason for hiding this comment

RektPunk commented Jul 7, 2024

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Jul 16, 2024 • edited Loading

RektPunk commented Jul 6, 2024 •

edited by jameslamb

Loading

jameslamb commented Jul 16, 2024 •

edited

Loading