-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added option to specify weights in xr.corr() and xr.cov() #8527
Conversation
…t if parameter weights is not None
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
8573afa
to
0d2f548
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…into weighted_cov_corr
This is great, thanks @lluritu ! And welcome to xarray! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need a few tests for this too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @lluritu !
One comment — we already have a .weighted
method — could we use that to make this more orthogonal?
It's not a perfect fit, since folks will need to call xr.corr(foo.weighted(bar), baz)
, rather than foo.weighted(bar).sum()
.
You're welcome!
|
Could you please help me a bit w that? e.g. pointing me to current tests of the cov, corr and _cov_corr functions would be a good starting point. Is there a way to test a previous version of a function with a modified one to make sure I'm not breaking the already existing functionality?
|
Of course 🙂 ! Tests for this kind of "general-purpose computation method" are in xarray/xarray/tests/test_computation.py Line 1591 in 7c1bb8d
I agree though this seems like a fairly strong argument against @max-sixty 's specific proposed API above, since as far as I know we have no way to associate one set of weights with two DataArrays? |
Yeah, the logic would be "use weights from one of the input" i agree it has that downside. When we first implemented this, I was originally keener on I'm fine going ahead with the |
…s same results when weights=None or weights=xr.DataArray(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good! I think all it needs is a proper test
Separately from this PR, the approach seems fundamentally inelegant, but I don't think there's a better one. The only thing I can think of is allowing .weighted
objects to do elementwise arithmetic operations on the underlying array. Then I think that xr.cov(da.weighted(foo), da2.weighted(foo))
would "just work". But also seems like that's possibly playing with fire; I'd have to think more.
The new xr.corr and xr.cov pass all the previous tests. |
Could we add a line to the whatsnew? I merged Then we can hit the button! |
Co-authored-by: Maximilian Roos <[email protected]>
Co-authored-by: Maximilian Roos <[email protected]>
Doctest and mypy produce errors.
any hint on how to fix that? |
I thought I fixed this yesterday, but actually I pushed to the wrong place! So I now pushed to this branch and it should work! I'll also merge |
* main: (26 commits) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) Use numbagg for `rolling` methods (pydata#8493) Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514) fix RTD docs build (pydata#8519) Fix type of `.assign_coords` (pydata#8495) ...
* main: (58 commits) Adapt map_blocks to use new Coordinates API (pydata#8560) add xeofs to ecosystem.rst (pydata#8561) Offer a fixture for unifying DataArray & Dataset tests (pydata#8533) Generalize cumulative reduction (scan) to non-dask types (pydata#8019) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) ...
commit 0a0f800 Merge: 33c8033 41d33f5 Author: Deepak Cherian <[email protected]> Date: Tue Jan 2 20:42:51 2024 -0700 Merge branch 'main' into depr-groupby-squeeze-2 commit 33c8033 Author: Deepak Cherian <[email protected]> Date: Tue Jan 2 20:40:42 2024 -0700 Don't skip for resampling commit d7be352 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed Jan 3 03:24:13 2024 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit d13fa0e Author: Deepak Cherian <[email protected]> Date: Tue Jan 2 20:23:43 2024 -0700 Apply suggestions from code review Co-authored-by: Michael Niklas <[email protected]> commit dd6ea53 Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 19:29:40 2023 -0700 Silence more warnings commit 44e5a41 Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 19:21:06 2023 -0700 minimize test mods commit 94c1c1f Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:55:46 2023 -0700 Add tests for pydata#8263 commit 0ab4eb6 Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:47:41 2023 -0700 Fix typing commit a064430 Merge: d6a3f2d 03ec3cb Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:47:04 2023 -0700 Merge branch 'main' into depr-groupby-squeeze-2 * main: Fix mypy type ignore (pydata#8564) Support for the new compression arguments. (pydata#7551) FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552) Adapt map_blocks to use new Coordinates API (pydata#8560) add xeofs to ecosystem.rst (pydata#8561) Offer a fixture for unifying DataArray & Dataset tests (pydata#8533) Generalize cumulative reduction (scan) to non-dask types (pydata#8019) commit d6a3f2d Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:46:50 2023 -0700 Fix generator for aggregations commit 97f1695 Author: Deepak Cherian <[email protected]> Date: Tue Dec 19 10:58:11 2023 -0700 Fix docs commit 5b33b98 Author: Deepak Cherian <[email protected]> Date: Sun Dec 17 20:35:53 2023 -0700 fix whats-new commit 80b2b36 Author: Deepak Cherian <[email protected]> Date: Sun Dec 17 20:26:17 2023 -0700 Reduce more warnings commit 5f6f4ea Merge: a57d4ae 2971994 Author: Deepak Cherian <[email protected]> Date: Sat Dec 16 20:33:13 2023 -0700 Merge branch 'main' into depr-groupby-squeeze-2 * main: (26 commits) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) Use numbagg for `rolling` methods (pydata#8493) Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514) fix RTD docs build (pydata#8519) Fix type of `.assign_coords` (pydata#8495) ... commit a57d4ae Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 21:36:04 2023 -0700 Test one more warning commit bf8139d Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 21:33:45 2023 -0700 Update xarray/tests/test_groupby.py commit 4e9a063 Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 21:10:14 2023 -0700 Set squeeze=None for Dataset too commit c2e576e Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 20:54:17 2023 -0700 Fix first, last commit 6d8e822 Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 20:46:21 2023 -0700 better warning commit 62c334b Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 20:45:17 2023 -0700 silence warnings commit b7805a8 Author: dcherian <[email protected]> Date: Tue Aug 15 10:54:25 2023 -0600 Deprecate `squeeze` in GroupBy. Closes pydata#2157
This is indeed a great contribution, thank you @lluritu! Is the new version of xarray containing this capability released, or soon will be? |
Added option to specify weights in xr.corr() and xr.cov().
This functionality is useful e.g. to compute correlations of spatial fields when the data is in a lat-lon grid, because grid points closer to the poles have a smaller area and need to have less weight.
whats-new.rst
api.rst