Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cramer's V (Cramer's Phi) #1298

Merged
merged 55 commits into from
Nov 14, 2022
Merged

Add Cramer's V (Cramer's Phi) #1298

merged 55 commits into from
Nov 14, 2022

Conversation

stancld
Copy link
Contributor

@stancld stancld commented Oct 28, 2022

What does this PR do?

Fixes #1272

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@stancld stancld changed the title Add Cramer's V (Cramer's phi) Add Cramer's V (Cramer's Phi) Oct 28, 2022
@stancld
Copy link
Contributor Author

stancld commented Oct 28, 2022

@Borda It seems to me CI uses cached pip packages and nothing new is installed, even though I added a new requirement file. Do you know how to enforce CI to install this package? (When I run locally pip install -e '.[test]', the package is installed as it should. (I guess we need to clean cache, but I'm not confident to do it)

Edit: Ups, my fault. Forgot to define conditional import and testing.

@codecov
Copy link

codecov bot commented Oct 28, 2022

Codecov Report

Merging #1298 (c23b0de) into master (fe55207) will decrease coverage by 54%.
The diff coverage is 98%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1298     +/-   ##
========================================
- Coverage      87%     32%    -54%     
========================================
  Files         195     200      +5     
  Lines       11369   11472    +103     
========================================
- Hits         9856    3709   -6147     
- Misses       1513    7763   +6250     

@stancld stancld added this to the v0.11 milestone Oct 28, 2022
@mergify mergify bot added ready and removed ready labels Nov 8, 2022
@mergify mergify bot added ready and removed ready labels Nov 9, 2022
@mergify mergify bot added the ready label Nov 9, 2022
@SkafteNicki
Copy link
Member

@stancld , @Borda there seems to be major problems with this PR on GPU. Not only are some of the tests added in this PR failing but also others. Additionally, the tests runs out of time (even if I increase the total runtime). I assume it has to do with the dython dependency as it is the only thing that makes sense to me.

@mergify mergify bot removed the ready label Nov 10, 2022
@SkafteNicki
Copy link
Member

@stancld I been trying to debug this issue. On my local cluster running ubuntu+gpu I am getting the following error if using a version of pandas <1.4.0 (most tests fail with this error, other hangs):

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/unittests/helpers/testers.py:467: in run_class_metric_test
    _class_test(
tests/unittests/helpers/testers.py:215: in _class_test
    sk_batch_result = sk_metric(preds_, target_, **batch_kwargs_update)
tests/unittests/nominal/test_cramers.py:70: in _dython_cramers_v
    v = dython_cramers_v(
../.conda/envs/metrics/lib/python3.8/site-packages/dython/nominal.py:139: in cramers_v
    confusion_matrix = pd.crosstab(x, y)
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/reshape/pivot.py:654: in crosstab
    df = DataFrame(data, index=common_idx)
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/frame.py:614: in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/internals/construction.py:464: in dict_to_mgr
    return arrays_to_mgr(
../.conda/envs/metrics/lib/python3.8/site-packages/pandas/core/internals/construction.py:119: in arrays_to_mgr
    index = _extract_index(arrays)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = [1.0, 0.0, 1.0, 3.0, 3.0, 1.0, ...]

    def _extract_index(data) -> Index:
        """
        Try to infer an Index from the passed data, raise ValueError on failure.
        """
        index = None
        if len(data) == 0:
            index = Index([])
        elif len(data) > 0:
            raw_lengths = []
            indexes: list[list[Hashable] | Index] = []
    
            have_raw_arrays = False
            have_series = False
            have_dicts = False
    
            for val in data:
                if isinstance(val, ABCSeries):
                    have_series = True
                    indexes.append(val.index)
                elif isinstance(val, dict):
                    have_dicts = True
                    indexes.append(list(val.keys()))
                elif is_list_like(val) and getattr(val, "ndim", 1) == 1:
                    have_raw_arrays = True
                    raw_lengths.append(len(val))
    
            if not indexes and not raw_lengths:
>               raise ValueError("If using all scalar values, you must pass an index")
E               ValueError: If using all scalar values, you must pass an index

If using 1.4.0 or higher it does not fail.

@stancld
Copy link
Contributor Author

stancld commented Nov 12, 2022

@SkafteNicki Unfortunately, for the higher pandas version there's incompatibility with numpy version. When pandas is unpinned, we can skip the tests for oldest configuration. But still dunno what's the problem with GPU. Is there a way how to run locally tests on GPU? (e.g. with a flag or so?)


Edit: Actually, I see the problem with GPU tests. We use python 3.7, and there only pandas<=1.3.5 is available 😬

Also, I can see that on GPU tests only with nan_strategy='drop' are failing. Trying to dig out more details.

@stancld stancld self-assigned this Nov 13, 2022
@mergify mergify bot added the ready label Nov 13, 2022
@SkafteNicki
Copy link
Member

SkafteNicki commented Nov 13, 2022

@SkafteNicki Unfortunately, for the higher pandas version there's incompatibility with numpy version. When pandas is unpinned, we can skip the tests for oldest configuration. But still dunno what's the problem with GPU. Is there a way how to run locally tests on GPU? (e.g. with a flag or so?)

Edit: Actually, I see the problem with GPU tests. We use python 3.7, and there only pandas<=1.3.5 is available 😬

Also, I can see that on GPU tests only with nan_strategy='drop' are failing. Trying to dig out more details.

Then we probably need to skip tests if python < 3.8.

I mark the test so that oldest and python=3.7 would be skipped. It shall pass now. If you're okay with that (it is documented in the code), it should be mergeable.

@stancld stancld mentioned this pull request Nov 13, 2022
4 tasks
@SkafteNicki SkafteNicki merged commit 3636182 into master Nov 14, 2022
@SkafteNicki SkafteNicki deleted the metric/cramers-phi branch November 14, 2022 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Cramer's V (Cramér's phi)
4 participants