Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a error summary for assert_allclose #3847

Merged
merged 24 commits into from
Jun 13, 2020

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Mar 8, 2020

right now, unlike assert_identical and assert_equals, assert_allclose simply prints the mismatching data without providing a summary about what failed. This tries to fix that by allowing the compat parameter to formatting.diff_array_repr and formatting.diff_dataset_repr to be a callable and by rewriting assert_allclose to only have one assert statement per type.

There are still more tests to write but I would like to get early feedback about this approach vs. adding a "allclose" compat type in addition to the existing "equals" and "identical" to the formatting functions.

@max-sixty
Copy link
Collaborator

Looks great!

@keewis
Copy link
Collaborator Author

keewis commented Mar 8, 2020

the failures should be a bug in pint and hopefully fixed by hgrecco/pint#1044

@dcherian
Copy link
Contributor

dcherian commented Mar 9, 2020

Let's mark those as xfail then?

@keewis
Copy link
Collaborator Author

keewis commented Mar 9, 2020

now only the min-all-deps and min-nep18 CI fail. I suspect that is due to the pinned dependencies.

@dcherian dcherian mentioned this pull request Mar 19, 2020
13 tasks
@keewis
Copy link
Collaborator Author

keewis commented Mar 19, 2020

to me the failing tests on the min-all-deps and min-nep18 CI seem like a dask issue (in var?) that has been resolved in later versions: if we compute actual before passing it to assert_allclose the tests don't fail. How should I resolve this?

Also, where should I put the whats-new.rst entry?

@dcherian
Copy link
Contributor

Our minimum versions policy allows us to bump dask. So let's do that?

dask          2.2     (2019-08-01) 2.5     (2019-09-27) <
distributed   2.2     (2019-08-01) 2.5     (2019-09-27) <

Also, where should I put the whats-new.rst entry?

New features? It is public API and is a very nice enhancement.

@keewis
Copy link
Collaborator Author

keewis commented Mar 19, 2020

unfortunately, we need dask>=2.9.1 for that. I could try debugging a bit more to find out exactly why it fails (or someone helps me with that?), but that would take me a bit more time than just skipping / xfailing those tests if on dask<2.9.1.

@keewis
Copy link
Collaborator Author

keewis commented Mar 27, 2020

The issue is with how dask<2.9.1 handles dtypes on compute (in nanvar?) when the data is an array with dtype object filled with False and some missing values represented by np.nan.

I really lack experience with dask, though, so I'm clueless as to what to do to fix that (besides calling compute before passing to allclose_or_equiv) and would appreciate help with this.

@dcherian dcherian mentioned this pull request May 7, 2020
23 tasks
@keewis keewis force-pushed the assert_allclose-formatting branch from 5df4d33 to 0220f45 Compare May 7, 2020 23:46
@keewis
Copy link
Collaborator Author

keewis commented May 7, 2020

@pydata/xarray, this is really close but I'm not familiar enough with dask to get it to work with dask<2.9.1.

Once we get the py36-min-all-deps and py36-min-nep18 CI to pass, this should be ready for a final review and merging.

Edit: there seem to be a few failing tests related to pint (I will fix those) but the tests in question are in test_duck_array_ops.py and only fail with py36-min-nep18.

@dcherian
Copy link
Contributor

dcherian commented May 8, 2020

I'm confused here. What did you change to trigger this error?

ALL WRONG: We could call arr1.compute(), arr2.compute() in

if lazy_equiv is None:
return bool(isclose(arr1, arr2, rtol=rtol, atol=atol, equal_nan=True).all())

At that point, we need to compare actual values. I'm not sure that dask exits all() early if one chunk evaluates to False ( in which case this change would be a performance regression)

EDIT: ignore this. We can't compute because it might blow memory.

@dcherian
Copy link
Contributor

example failure:

___________________ test_reduce[None-True-var-True-bool_-1] ____________________

dim_num = 1, dtype = <class 'numpy.bool_'>, dask = True, func = 'var'
skipna = True, aggdim = None

    @pytest.mark.parametrize("dim_num", [1, 2])
    @pytest.mark.parametrize("dtype", [float, int, np.float32, np.bool_])
    @pytest.mark.parametrize("dask", [False, True])
    @pytest.mark.parametrize("func", ["sum", "min", "max", "mean", "var"])
    # TODO test cumsum, cumprod
    @pytest.mark.parametrize("skipna", [False, True])
    @pytest.mark.parametrize("aggdim", [None, "x"])
    def test_reduce(dim_num, dtype, dask, func, skipna, aggdim):
    
        if aggdim == "y" and dim_num < 2:
            pytest.skip("dim not in this test")
    
        if dtype == np.bool_ and func == "mean":
            pytest.skip("numpy does not support this")
    
        if dask and not has_dask:
            pytest.skip("requires dask")
    
        if dask and skipna is False and dtype in [np.bool_]:
            pytest.skip("dask does not compute object-typed array")
    
        rtol = 1e-04 if dtype == np.float32 else 1e-05
    
        da = construct_dataarray(dim_num, dtype, contains_nan=True, dask=dask)
        axis = None if aggdim is None else da.get_axis_num(aggdim)
    
        # TODO: remove these after resolving
        # https://github.com/dask/dask/issues/3245
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", "Mean of empty slice")
            warnings.filterwarnings("ignore", "All-NaN slice")
            warnings.filterwarnings("ignore", "invalid value encountered in")
    
            if da.dtype.kind == "O" and skipna:
                # Numpy < 1.13 does not handle object-type array.
                try:
                    if skipna:
                        expected = getattr(np, f"nan{func}")(da.values, axis=axis)
                    else:
                        expected = getattr(np, func)(da.values, axis=axis)
    
                    actual = getattr(da, func)(skipna=skipna, dim=aggdim)
                    assert_dask_array(actual, dask)
                    assert np.allclose(
                        actual.values, np.array(expected), rtol=1.0e-4, equal_nan=True
                    )
                except (TypeError, AttributeError, ZeroDivisionError):
                    # TODO currently, numpy does not support some methods such as
                    "casting='same_kind'"
>                   % (funcname(function), str(dtype), str(result.dtype))
                )
E               ValueError: Inferred dtype from function 'sub' was 'int64' but got 'float64', which can't be cast using casting='same_kind'

@keewis
Copy link
Collaborator Author

keewis commented May 13, 2020

Unfortunately, simply replacing assert np.allclose(...) with np.testing.assert_allclose(...) does not fix this. Also, I undid the swapping of arguments to allclose_or_equiv which makes the min-all-deps CI fail, too. Does anyone know why that works?

Would it make sense to wait until we can bump the dask version to 2.9?

@dcherian
Copy link
Contributor

Would it make sense to wait until we can bump the dask version to 2.9?

Sounds good to me. It's only 1.5 months away

@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

I needed to use assert_allclose in the pint tests in #3975, so I modified it to use .data instead of .values and now the exact same tests are failing. So it seems that assert_allclose never worked with dask but always converted to numpy. Would it make sense to investigate further, or is it better to just wait until we can merge this?

@dcherian
Copy link
Contributor

How about we call compute in assert_allclose for boolean dask arrays when dask < 2.9.1? This bit can then be removed in a couple of months.

This compute was happening in previous versions anyway (because assert_allclose was using .values) so this would not be a regression.

@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

let's see if that works

@shoyer
Copy link
Member

shoyer commented May 27, 2020

So it seems that assert_allclose never worked with dask but always converted to numpy. Would it make sense to investigate further, or is it better to just wait until we can merge this?

This was intentional, I think. np.testing.assert_allclose() doesn't support dispatching -- it always converts into NumPy arrays. I'm not sure that assert_allclose would even be well defined on dask arrays otherwise, because it doesn't have any output (it only raises an error).

@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

hmm... using arr.compute() fails while np.array(arr) works. For now, I'm converting all dask arrays to numpy if the dask version is not high enough (I can't do that only for bool_ arrays since they have been converted to float somewhere).

@shoyer: I'm confused, I thought that xr.testing.assert_allclose explicitly supported duck arrays (it calls duck_array_ops.allclose_or_equiv). TBC, what I was talking about was these lines:

allclose = _data_allclose_or_equiv(a.values, b.values, **kwargs)

xarray/xarray/testing.py

Lines 132 to 135 in e5cc19c

allclose = _data_allclose_or_equiv(
a.coords[v].values, b.coords[v].values, **kwargs
)
assert allclose, "{}\n{}".format(a.coords[v].values, b.coords[v].values)

where the .values always convert to numpy

@shoyer
Copy link
Member

shoyer commented May 27, 2020

I'm confused, I thought that xr.testing.assert_allclose explicitly supported duck arrays (it calls duck_array_ops.allclose_or_equiv).

Right, it explicitly supports duck arrays -- by always converting them into NumPy!

@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

then it seems that this PR removes this intentional restriction. I'm not sure it is still needed, though: we don't use np.testing.assert_allclose internally because we want to provide our own error messages, and the actual comparison is done with

return bool(isclose(arr1, arr2, rtol=rtol, atol=atol, equal_nan=True).all())

which should allow dispatching

@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

anyway, with this the tests finally pass 🎉 so this should be ready for review and merging.

@dcherian dcherian requested a review from shoyer May 27, 2020 19:59
@keewis
Copy link
Collaborator Author

keewis commented Jun 1, 2020

gentle ping, @shoyer

@dcherian dcherian self-requested a review June 12, 2020 15:06
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @keewis

@@ -1413,6 +1428,13 @@ def example_1d_objects(self):
]:
yield (self.cls("x", data), data)

# TODO: remove once pint==0.12 has been released
@pytest.mark.xfail(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like pint 0.12 is out. Shall we remove these xfails now or do you want to do that later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something went wrong with the release of 0.12 so I'd like to wait until I got a response from the maintainer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Sounds good. IMO we should merge this.

@keewis
Copy link
Collaborator Author

keewis commented Jun 12, 2020

unless there are any objections (@shoyer?) I'm going to merge this tomorrow.

@mathause
Copy link
Collaborator

Do you need to add a note to what's new that the dask version is bumped?

@keewis
Copy link
Collaborator Author

keewis commented Jun 12, 2020

I think we could bump the version even further (at least to 2.8, maybe also 2.9) so that should probably be a different PR

@keewis keewis merged commit 2ba5300 into pydata:master Jun 13, 2020
@keewis keewis deleted the assert_allclose-formatting branch June 13, 2020 17:53
dcherian added a commit to TomNicholas/xarray that referenced this pull request Jun 24, 2020
…o-combine

* 'master' of github.com:pydata/xarray: (81 commits)
  use builtin python types instead of the numpy alias (pydata#4170)
  Revise pull request template (pydata#4039)
  pint support for Dataset (pydata#3975)
  drop eccodes in docs (pydata#4162)
  Update issue templates inspired/based on dask (pydata#4154)
  Fix failing upstream-dev build & remove docs build (pydata#4160)
  Improve typehints of xr.Dataset.__getitem__ (pydata#4144)
  provide a error summary for assert_allclose (pydata#3847)
  built-in accessor documentation (pydata#3988)
  Recommend installing cftime when time decoding fails. (pydata#4134)
  parameter documentation for DataArray.sel (pydata#4150)
  speed up map_blocks (pydata#4149)
  Remove outdated note from datetime accessor docstring (pydata#4148)
  Fix the upstream-dev pandas build failure (pydata#4138)
  map_blocks: Allow passing dask-backed objects in args (pydata#3818)
  keep attrs in reset_index (pydata#4103)
  Fix open_rasterio() for WarpedVRT with specified src_crs (pydata#4104)
  Allow non-unique and non-monotonic coordinates in get_clean_interp_index and polyfit (pydata#4099)
  update numpy's intersphinx url (pydata#4117)
  xr.infer_freq (pydata#4033)
  ...
@jthielen jthielen mentioned this pull request Jul 1, 2020
3 tasks
dcherian added a commit to raphaeldussin/xarray that referenced this pull request Jul 1, 2020
* upstream/master: (21 commits)
  fix typo in error message in plot.py (pydata#4188)
  Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods (pydata#3936)
  Show data by default in HTML repr for DataArray (pydata#4182)
  Blackdoc (pydata#4177)
  Add CONTRIBUTING.md for the benefit of GitHub
  Correct dask handling for 1D idxmax/min on ND data (pydata#4135)
  use assert_allclose in the aggregation-with-units tests (pydata#4174)
  Remove old auto combine (pydata#3926)
  Fix 4009 (pydata#4173)
  Limit length of dataarray reprs (pydata#3905)
  Remove <pre> from nested HTML repr (pydata#4171)
  Proposal for better error message about in-place operation (pydata#3976)
  use builtin python types instead of the numpy alias (pydata#4170)
  Revise pull request template (pydata#4039)
  pint support for Dataset (pydata#3975)
  drop eccodes in docs (pydata#4162)
  Update issue templates inspired/based on dask (pydata#4154)
  Fix failing upstream-dev build & remove docs build (pydata#4160)
  Improve typehints of xr.Dataset.__getitem__ (pydata#4144)
  provide a error summary for assert_allclose (pydata#3847)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

diff formatting for assert_allclose
5 participants