-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-write pearsonr
to use Resolve
#5638
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5638 +/- ##
==========================================
+ Coverage 89.69% 89.71% +0.01%
==========================================
Files 90 90
Lines 22809 22812 +3
Branches 5441 5436 -5
==========================================
+ Hits 20458 20465 +7
+ Misses 1618 1617 -1
+ Partials 733 730 -3 ☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
53ef2e2
to
a53eba1
Compare
a53eba1
to
4ac142b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
A couple more updates because it's generally better to not break your own use cases 👀 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to review this but I ran out of time.
I have finished reviewing every module except stats.py
. Either myself or someone else can pick this back up in the future.
These do not need real and lazy versions
Co-authored-by: Martin Yeo <[email protected]>
37d45a6
to
b3b9dde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I don't mind admitting that I found that super difficult. Most of iris.analysis
remains over my head, even after 5 years.
I can follow what the new code is doing, and have come up with a few suggestions. I can't speak to the equivalence with the old code, but I'll rely on the tests to confirm that.
The more stuff we can get relying on Resolve
, the better, so well done for making the effort. I'm sure @bjlittle will be delighted!
Co-authored-by: Martin Yeo <[email protected]>
Performance Benchmark Report: 1808c4aPerformance shifts
Full benchmark results
Generated by GHA run |
3fe6f9d
to
e5a4bd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @trexfeathers for your thorough review. I think I have now responded to all the comments one way or another.
I don't mind admitting that I found that super difficult.
Full disclosure: I originally started trying to do this just to see if I could! So yeah, it was a challenge for me too.
Performance Benchmark Report: a7f3e0fPerformance shifts
Full benchmark results
Generated by GHA run |
Improvements to `pearsonr` docstring
Thanks @trexfeathers! |
* upstream/main: (26 commits) Bump scitools/workflows from 2023.12.1 to 2024.01.0 (SciTools#5710) Faster trivial equality checks for coordinates and arrays (SciTools#5691) Make the Coord.cell method lazy (SciTools#5693) Re-write `pearsonr` to use `Resolve` (SciTools#5638) ruff compliance for D401. (SciTools#5687) Bump actions/cache from 3 to 4 (SciTools#5703) update rtd ubuntu and mambaforge (SciTools#5702) [pre-commit.ci] pre-commit autoupdate (SciTools#5699) ruff compliance for D205. (SciTools#5681) Added whatsnew to warnings PR (SciTools#5696) Removed use of catch_warnings (SciTools#5685) Better documentation for Cube names in curl (SciTools#5680) fix typo (SciTools#5689) Make further topics obviously different (SciTools#5684) [pre-commit.ci] pre-commit autoupdate (SciTools#5682) ruff complliance for D100. (SciTools#5678) move tests.pp to tests.__init__ (SciTools#5679) ruff compliance for D417, D106 (SciTools#5677) ruff compliance for D200. (SciTools#5675) ruff compliance for D106. (SciTools#5676) ...
🚀 Pull Request
Description
History
pearsonr
was first added to Iris at #555. To perform the calculation, the cubes' data were pulled out as numpy arrays and reshaped/transposed within the_get_calc_view
helper so the dimension to be collapsed was always the leading dimension. This ensured that the collapsed arrays would broadcast to the original, which is required by the calculation.At #1714 we re-wrote
pearsonr
to usecube.collapsed
and cube arithmetic. This simplified the implementation because those cube operations took care of getting the various dimensions lined up correctly. It also gave us broadcasting for free (previously the input cubes had to be the same shape) as well as the weights and mdi keywords which were handled by the aggregators. As the aggregators matured we also got lazy evaluation for free in most cases. However, using the cube operations did slowpearsonr
down a bit.Present Change
We can now have the best of both worlds:
Resolve
figures out the dimension handling up front and gives us cubes whose arrays are guaranteed to broadcast to each other. The keepdims keyword for variousnumpy
functions means we can collapse any dimension of an array and the result will still broadcast to the original shape. This PR takes advantage ofResolve
and keepdims to move back to doing most of the calculations directly on arrays. Advantages:True
, and I couldn't see a neat way to address that with the existing implementation.Speed comparison
This script adapted from one of the tests
With
main
:With my branch:
Testing
I modified the existing tests so that the input cubes are always lazy, and then subclassed to make non-lazy versions of all of them. Not sure if this is overkill?
Numpy min pin
Note that the keepdims keyword was only added to numpy.ma.average at v1.23,
so this PR moves the minimum pin. NEP29 says we should support v1.22 until 1st January, but I am assuming the chances of this PR making it into a release before New Year are low.dask.ma.average
got the keepdims keyword at v2022-08-0. Our dask min pin is already higher than this.Consult Iris pull request check list