-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NaN handling in calc_wd_mean_radial #68
Conversation
Codecov ReportBase: 33.18% // Head: 30.31% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #68 +/- ##
===========================================
- Coverage 33.18% 30.31% -2.87%
===========================================
Files 38 40 +2
Lines 3779 4262 +483
===========================================
+ Hits 1254 1292 +38
- Misses 2525 2970 +445
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Nice find @misi9170! Perhaps we can add an option for NaN handling, similar to |
@Bartdoekemeijer thanks for the suggestion! I had been meaning to test it for speed against scipy.stats.circmean in any case. I found scipy.stats.circmean to be slightly faster than the current implementation. Moreover, setting The default |
@paulf81 @Bartdoekemeijer OK to merge? |
I was just about to say, but won't this be slower? But happy to hear it's actually faster! Good to go from my end |
@paulf81 Thanks for calling that out---I was wrong. I went back and checked, and I had been comparing the times incorrectly. In fact, this proposed implementation (using scipy's For reference, on my laptop this operation takes about 3.5 seconds to calculate 1,000,000 means, each containing 100 points, which I think is still fast enough to not be a major hold up for dealing with SCADA records. |
15% slower for the sake of simplicity seems fine to me! |
Great---squashing and merging. |
Thanks guys, and I think agree, simplicity is a good goal. Also was thinking the more we have a practice of using tools versus reinventing probably then we'll also set ourselves up to benefit from improvements made to those tools, so probably in addition to simplicity, this is a good decision from the point of view of good practice, like, don't write any functions that exist already in numpy/scipy. I know this is already merged was just thinking of adding this point to maybe nudge future decisions in this same way. Thanks @misi9170 and @Bartdoekemeijer ! |
Issue
calc_wd_mean_radial()
produces some inconsistent behavior. When theangles_array_deg
input is a numpy array that contains NaNs, the output mean wind direction is NaN. However, whenangles_array_deg
is a dataframe, with only a few NaN values, the output ignores the NaNs, and whenangles_array_deg
contains entire rows/columns of NaNs, the output is 0.Solution
I propose the following approach:
Note that this is slightly different than current behavior, which:
angles_array_deg
is a numpy array; and returns the mean over non-NaN values ifangles_array_deg
is a dataframe.angles_array_deg
is a numpy array; and returns 0 when all wind directions are NaN ifangles_array_deg
is a dataframe.Steps to recreate issue
Run the following code on the current develop branch and compare to this bugfix branch:
I could also turn the above (or something similar) into a unit test for
calc_wd_mean_radial()
.