Add Cramer's V (Cramer's Phi) #1298

stancld · 2022-10-28T17:23:04Z

What does this PR do?

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

stancld · 2022-10-28T18:34:16Z

@Borda It seems to me CI uses cached pip packages and nothing new is installed, even though I added a new requirement file. Do you know how to enforce CI to install this package? (When I run locally pip install -e '.[test]', the package is installed as it should. (I guess we need to clean cache, but I'm not confident to do it)

Edit: Ups, my fault. Forgot to define conditional import and testing.

codecov · 2022-10-28T20:53:39Z

Codecov Report

Merging #1298 (c23b0de) into master (fe55207) will decrease coverage by 54%.
The diff coverage is 98%.

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #1298     +/-   ##
========================================
- Coverage      87%     32%    -54%     
========================================
  Files         195     200      +5     
  Lines       11369   11472    +103     
========================================
- Hits         9856    3709   -6147     
- Misses       1513    7763   +6250

tests/unittests/nominal/test_cramers.py

for more information, see https://pre-commit.ci

tests/unittests/nominal/test_cramers.py

for more information, see https://pre-commit.ci

SkafteNicki · 2022-11-09T15:30:01Z

@stancld , @Borda there seems to be major problems with this PR on GPU. Not only are some of the tests added in this PR failing but also others. Additionally, the tests runs out of time (even if I increase the total runtime). I assume it has to do with the dython dependency as it is the only thing that makes sense to me.

SkafteNicki · 2022-11-10T14:30:06Z

@stancld I been trying to debug this issue. On my local cluster running ubuntu+gpu I am getting the following error if using a version of pandas <1.4.0 (most tests fail with this error, other hangs):

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/unittests/helpers/testers.py:467: in run_class_metric_test
    _class_test(
tests/unittests/helpers/testers.py:215: in _class_test
    sk_batch_result = sk_metric(preds_, target_, **batch_kwargs_update)
tests/unittests/nominal/test_cramers.py:70: in _dython_cramers_v
    v = dython_cramers_v(
../.conda/envs/metrics/lib/python3.8/site-packages/dython/nominal.py:139: in cramers_v
    confusion_matrix = pd.crosstab(x, y)
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/reshape/pivot.py:654: in crosstab
    df = DataFrame(data, index=common_idx)
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/frame.py:614: in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/internals/construction.py:464: in dict_to_mgr
    return arrays_to_mgr(
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/internals/construction.py:119: in arrays_to_mgr
    index = _extract_index(arrays)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = [1.0, 0.0, 1.0, 3.0, 3.0, 1.0, ...]

    def _extract_index(data) -> Index:
        """
        Try to infer an Index from the passed data, raise ValueError on failure.
        """
        index = None
        if len(data) == 0:
            index = Index([])
        elif len(data) > 0:
            raw_lengths = []
            indexes: list[list[Hashable] | Index] = []
    
            have_raw_arrays = False
            have_series = False
            have_dicts = False
    
            for val in data:
                if isinstance(val, ABCSeries):
                    have_series = True
                    indexes.append(val.index)
                elif isinstance(val, dict):
                    have_dicts = True
                    indexes.append(list(val.keys()))
                elif is_list_like(val) and getattr(val, "ndim", 1) == 1:
                    have_raw_arrays = True
                    raw_lengths.append(len(val))
    
            if not indexes and not raw_lengths:
>               raise ValueError("If using all scalar values, you must pass an index")
E               ValueError: If using all scalar values, you must pass an index

If using 1.4.0 or higher it does not fail.

stancld · 2022-11-12T17:56:32Z

@SkafteNicki Unfortunately, for the higher pandas version there's incompatibility with numpy version. When pandas is unpinned, we can skip the tests for oldest configuration. But still dunno what's the problem with GPU. Is there a way how to run locally tests on GPU? (e.g. with a flag or so?)

Edit: Actually, I see the problem with GPU tests. We use python 3.7, and there only pandas<=1.3.5 is available 😬

Also, I can see that on GPU tests only with nan_strategy='drop' are failing. Trying to dig out more details.

SkafteNicki · 2022-11-13T15:14:08Z

@SkafteNicki Unfortunately, for the higher pandas version there's incompatibility with numpy version. When pandas is unpinned, we can skip the tests for oldest configuration. But still dunno what's the problem with GPU. Is there a way how to run locally tests on GPU? (e.g. with a flag or so?)

Edit: Actually, I see the problem with GPU tests. We use python 3.7, and there only pandas<=1.3.5 is available 😬

Also, I can see that on GPU tests only with nan_strategy='drop' are failing. Trying to dig out more details.

Then we probably need to skip tests if python < 3.8.

I mark the test so that oldest and python=3.7 would be skipped. It shall pass now. If you're okay with that (it is documented in the code), it should be mergeable.

Add Cramer's V metric

82dac55

stancld requested review from edenlightning, SkafteNicki, Borda, justusschock, tchaton and ethanwharris as code owners October 28, 2022 17:23

Merge branch 'master' into metric/cramers-phi

ba6cf16

stancld changed the title ~~Add Cramer's V (Cramer's phi)~~ Add Cramer's V (Cramer's Phi) Oct 28, 2022

stancld added 8 commits October 28, 2022 19:26

Add example for matrix metric

4418797

Clean tests

6b4ea3a

Format matrix metric doc

08a7291

Clean docs

092b20e

Fix random seed for doctest

b5c7f18

Remove redundant ) in module metric docstring

5a58263

Fix docs

3069f7d

Delete redundant mypy ignore

9b68bc9

stancld and others added 3 commits October 28, 2022 20:52

Conditional import and tests if dython available

1b0ada9

Merge branch 'master' into metric/cramers-phi

cd6328e

Skip last test as well and replace torch.nan with float(nan)

7724cf9

stancld added the New metric label Oct 28, 2022

stancld added this to the v0.11 milestone Oct 28, 2022

Borda reviewed Oct 28, 2022

View reviewed changes

tests/unittests/nominal/test_cramers.py Outdated Show resolved Hide resolved

Borda and others added 2 commits October 29, 2022 00:54

Apply suggestions from code review

0b31253

[pre-commit.ci] auto fixes from pre-commit.com hooks

a2c0e0b

for more information, see https://pre-commit.ci

Borda reviewed Oct 28, 2022

View reviewed changes

tests/unittests/nominal/test_cramers.py Outdated Show resolved Hide resolved

Borda and others added 2 commits October 29, 2022 00:56

import dythom

f8d52be

[pre-commit.ci] auto fixes from pre-commit.com hooks

a235430

for more information, see https://pre-commit.ci

Apply suggestions from code review

ed25850

mergify bot added ready and removed ready labels Nov 8, 2022

retrigger tests

ad06101

mergify bot added ready and removed ready labels Nov 9, 2022

lower atol

60cc052

mergify bot added the ready label Nov 9, 2022

try longer testing

5c3c8ab

fix pandas

0f941a9

mergify bot removed the ready label Nov 10, 2022

mergify bot and others added 3 commits November 10, 2022 12:38

Merge branch 'master' into metric/cramers-phi

7e4ab8f

Use pandas>=1.3.2

7471f28

trying 1.4.0

a1bce56

Merge branch 'master' into metric/cramers-phi

bac9d70

Try to hack reqs

f689e54

stancld disabled auto-merge November 13, 2022 10:10

stancld self-assigned this Nov 13, 2022

Try to hack reqs

4d130dd

mergify bot added the ready label Nov 13, 2022

Try to skip tests on GPU

feb1f71

stancld force-pushed the metric/cramers-phi branch from 55c499e to feb1f71 Compare November 13, 2022 13:25

Skip matrix test + unpin pandas due to numpy mismatch

c23b0de

stancld mentioned this pull request Nov 13, 2022

Add CLIP score #1314

Merged

4 tasks

SkafteNicki merged commit 3636182 into master Nov 14, 2022

SkafteNicki deleted the metric/cramers-phi branch November 14, 2022 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cramer's V (Cramer's Phi) #1298

Add Cramer's V (Cramer's Phi) #1298

stancld commented Oct 28, 2022

stancld commented Oct 28, 2022 •

edited

Loading

codecov bot commented Oct 28, 2022 •

edited

Loading

SkafteNicki commented Nov 9, 2022

SkafteNicki commented Nov 10, 2022

stancld commented Nov 12, 2022 •

edited

Loading

SkafteNicki commented Nov 13, 2022 •

edited by stancld

Loading

Add Cramer's V (Cramer's Phi) #1298

Add Cramer's V (Cramer's Phi) #1298

Conversation

stancld commented Oct 28, 2022

What does this PR do?

Before submitting

PR review

Did you have fun?

stancld commented Oct 28, 2022 • edited Loading

codecov bot commented Oct 28, 2022 • edited Loading

Codecov Report

SkafteNicki commented Nov 9, 2022

SkafteNicki commented Nov 10, 2022

stancld commented Nov 12, 2022 • edited Loading

SkafteNicki commented Nov 13, 2022 • edited by stancld Loading

stancld commented Oct 28, 2022 •

edited

Loading

codecov bot commented Oct 28, 2022 •

edited

Loading

stancld commented Nov 12, 2022 •

edited

Loading

SkafteNicki commented Nov 13, 2022 •

edited by stancld

Loading