Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detailed report for testing.assert_equal and testing.assert_identical #1507

Merged
merged 11 commits into from
Jan 18, 2019

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Aug 11, 2017

  • Closes #xxxx
  • Tests added / passed
  • Passes git diff upstream/master | flake8 --diff
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

In addition to Dataset repr, the error message also shows the output of Dataset.info() for both datasets.

This may not be the most elegant solution, but it is helpful when datasets only differ by their attributes attached to coordinates or data variables (not shown in repr). I'm open to any suggestion.

The report shows the differences for dimensions, data values (Variable and DataArray), coordinates, data variables and attributes (the latter only for testing.assert_identical).

There is currently not much tests for xarray.testing functions, but I'm willing to add more if needed.

Not sure if it's worth a what's new entry (EDIT: added one).

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine adding something like this. Another option would be to add a verbose repr that prints value, attributes, and encoding. This could be tacked on to .info() or a new method. Currently, this is designed to look like the output from the ncdump command line utility.

assert a.identical(b), (
'{}\n{}\n---\n{}\n{}'
.format(a, b, _get_info_as_str(a), _get_info_as_str(b))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a xr.Variable instance, this will not work since it won't have the info method.

@shoyer
Copy link
Member

shoyer commented Aug 11, 2017

This is certainly welcome. In an ideal world, we might point specifically to the exact differing variables/attributes, but this is a fine start.

@benbovy
Copy link
Member Author

benbovy commented Aug 11, 2017

Indeed, exact differing variables/attributes would be much better here. I'll look into this next week.

pytest's failure reports and reports of numpy.testing functions may be good sources of inspiration.

@max-sixty
Copy link
Collaborator

Any interest in pushing this over the line @benbovy ?

@benbovy benbovy force-pushed the assert_identical_msg branch from 4298eba to 53f80b3 Compare January 15, 2019 08:16
@benbovy
Copy link
Member Author

benbovy commented Jan 15, 2019

Thanks for reminding me this PR @max-sixty ! It deserves more than no commits since 1.5 years!

The assertion error messages now look like pytest's failure reports (see test_eq_dict). I've tried to reuse as much as possible the repr formatting that we already have for the xarray objects.

As examples:

ds_a = xr.Dataset(data_vars={'var1': (('x', 'y'), [[1, 2, 3], [4, 5, 7]]),
                             'var2': ('x', [3, 4])},
                  coords={'x': ['a', 'b'], 'y': [1, 2, 3]},
                  attrs={'units': 'm', 'description': 'desc'})

ds_b = xr.Dataset(data_vars={'var1': ('x', [1, 2])},
                  coords={'x': ('x', ['a', 'c'], {'source': 0}), 'label': ('x', [1, 2])},
                  attrs={'units': 'kg'})
>>> xr.testing.assert_identical(ds_a, ds_b)
AssertionError: Left and right Dataset objects are not identical
Differing dimensions:
(x: 2, y: 3) != (x: 2)
Differing coordinates:
L * x        (x) <U1 'a' 'b'
R * x        (x) <U1 'a' 'c'
    source: 0
Left contains more coordinates:
  * y        (y) int64 1 2 3
Right contains more coordinates:
    label    (x) int64 1 2
Differing data variables:
L   var1     (x, y) int64 1 2 3 4 5 7
R   var1     (x) int64 1 2
Left contains more data variables:
    var2     (x) int64 3 4
Differing attributes:
L   units: m
R   units: kg
Left contains more attributes:
    description: desc
da_a = xr.DataArray([[1, 2, 3], [4, 5, 7]],
                    dims=('x', 'y'),
                    coords={'x': ['a', 'b'], 'y': [1, 2, 3]},
                    attrs={'units': 'm', 'description': 'desc'})

da_b = xr.DataArray([1, 2],
                    dims='x',
                    coords={'x': ['a', 'c'], 'label': ('x', [1, 2])},
                    attrs={'units': 'kg'})
>>> xr.testing.assert_equal(da_a, da_b)
AssertionError: Left and right DataArray objects are not equal
Differing values:
L
    array([[1, 2, 3],
           [4, 5, 7]])
R
    array([1, 2])
Differing coordinates:
L * x        (x) <U1 'a' 'b'
R * x        (x) <U1 'a' 'c'
Left contains more coordinates:
  * y        (y) int64 1 2 3
Right contains more coordinates:
    label    (x) int64 1 2

Those examples are probably not the most readable ones but it's for a full showcase.

A downside of this approach (i.e., report the exact differing) is that when a and b differs, full comparison is done once again for formatting the report. I don't think there is an easy way to avoid that, but I don't think it's a big deal either.

@max-sixty
Copy link
Collaborator

full comparison is done once again for formatting the report. I don't think there is an easy way to avoid that, but I don't think it's a big deal either.

i.e. there's a performance cost? Yeah that's def fine!

@max-sixty
Copy link
Collaborator

This looks amazing! I'm excited to use it. Thanks for finishing it up

@max-sixty
Copy link
Collaborator

If you want to add that example as a test, that could be a good way of documenting the function. But I don't think it's strictly needed

@benbovy
Copy link
Member Author

benbovy commented Jan 15, 2019

I'm wondering which is the best among the options below:

A.

Differing data variables:
L   var1     (x, y) int64 1 2 3 4 5 7
R   var1     (x) int64 1 2
L   var2     (x, y) int64 4 5 6 7 8 9
    description: variable 2 
R   var2     (x, y) int64 4 5 6 7 8 9

B.

Differing data variables:
L   
    var1     (x, y) int64 1 2 3 4 5 7
R
    var1     (x) int64 1 2
L
    var2     (x, y) int64 4 5 6 7 8 9
    description: variable 2 
R
    var2     (x, y) int64 4 5 6 7 8 9

C.

Differing data variables:
L   
    var1     (x, y) int64 1 2 3 4 5 7
    var2     (x, y) int64 4 5 6 7 8 9
    description: variable 2
R
    var1     (x) int64 1 2
    var2     (x, y) int64 4 5 6 7 8 9

@benbovy
Copy link
Member Author

benbovy commented Jan 15, 2019

If you want to add that example as a test, that could be a good way of documenting the function. But I don't think it's strictly needed

Yes I could use those examples for the tests. I think it might be good to write shallow tests at least for diff_array_repr and diff_dataset_repr (in formatting.py).

@max-sixty
Copy link
Collaborator

I think you should feel free to merge an MVP rather than perfecting, but check out https://docs.pytest.org/en/latest/example/simple.html#writing-well-integrated-assertion-helpers

I think that might be as simple as wrapping these in pytest.fail(...)

@max-sixty
Copy link
Collaborator

I'm wondering which is the best among the options below:

TBH I think they're all good; I don't have a strong view. I'd low-confidence rank A->C->B

@dcherian
Copy link
Contributor

I agree with @max-sixty A > C > B but only weakly.

@pep8speaks
Copy link

pep8speaks commented Jan 16, 2019

Hello @benbovy! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 16, 2019 at 13:27 Hours UTC

@max-sixty
Copy link
Collaborator

This looks v good @benbovy!

@benbovy
Copy link
Member Author

benbovy commented Jan 16, 2019

I agree with @max-sixty A > C > B but only weakly.

Same opinion.

This looks v good @benbovy!

I'm just fixing int32 vs int64 issue with tests run on appveyor, then to me it's ready to merge unless someone else has objections.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic!

Differing coordinates:
L * x (x) <U1 'a' 'b'
R * x (x) <U1 'a' 'c'
Left contains more coordinates:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Left contains more coordinates" sounds a little funny to me.

Maybe "Coordinates on the left DataArray but not the right" or "Coordinates only on the left object"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I followed pytest's reports too blindly here.

@benbovy benbovy changed the title More detailed AssertionError message for assert_identical Detailed report for testing.assert_equal and testing.assert_identical Jan 16, 2019
@benbovy
Copy link
Member Author

benbovy commented Jan 18, 2019

If everyone is happy with this I'm going to merge it.

@shoyer
Copy link
Member

shoyer commented Jan 18, 2019

If everyone is happy with this I'm going to merge it.

Yes, please!

@benbovy benbovy merged commit 1d0a2bc into pydata:master Jan 18, 2019
@benbovy benbovy mentioned this pull request Jan 18, 2019
4 tasks
@max-sixty
Copy link
Collaborator

Thanks @benbovy!

dcherian pushed a commit to dcherian/xarray that referenced this pull request Jan 24, 2019
* master:
  stale requires a label (pydata#2701)
  Update indexing.rst (pydata#2700)
  add line break to message posted (pydata#2698)
  Config for closing stale issues (pydata#2684)
  to_dict without data (pydata#2659)
  Update asv.conf.json (pydata#2693)
  try no rasterio in py36 env (pydata#2691)
  Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507)
  Hotfix for pydata#2662 (pydata#2678)
  Update README.rst (pydata#2682)
  Fix test failures with numpy=1.16 (pydata#2675)
dcherian pushed a commit to yohai/xarray that referenced this pull request Jan 24, 2019
* refactor-plot-utils: (22 commits)
  review comment.
  small rename
  stale requires a label (pydata#2701)
  Update indexing.rst (pydata#2700)
  add line break to message posted (pydata#2698)
  Config for closing stale issues (pydata#2684)
  to_dict without data (pydata#2659)
  Update asv.conf.json (pydata#2693)
  try no rasterio in py36 env (pydata#2691)
  Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507)
  Hotfix for pydata#2662 (pydata#2678)
  Update README.rst (pydata#2682)
  Fix test failures with numpy=1.16 (pydata#2675)
  lint
  Back to map_dataarray_line
  Refactor out cmap_params, cbar_kwargs processing
  Refactor out colorbar making to plot.utils._add_colorbar
  flake8
  facetgrid refactor
  Refactor out utility functions.
  ...
@benbovy benbovy deleted the assert_identical_msg branch October 25, 2019 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants