Fix NaN handling in calc_wd_mean_radial #68

misi9170 · 2023-02-15T21:01:14Z

Issue

calc_wd_mean_radial() produces some inconsistent behavior. When the angles_array_deg input is a numpy array that contains NaNs, the output mean wind direction is NaN. However, when angles_array_deg is a dataframe, with only a few NaN values, the output ignores the NaNs, and when angles_array_deg contains entire rows/columns of NaNs, the output is 0.

Solution

I propose the following approach:

If some, but not all, wind directions in the mean are NaNs, those values are ignored and the mean is over the non-NaN values
If all wind directions in the mean are NaNs, NaN is returned.

Note that this is slightly different than current behavior, which:

returns NaN when any NaNs are present if angles_array_deg is a numpy array; and returns the mean over non-NaN values if angles_array_deg is a dataframe.
returns NaN when all wind directions are NaN if angles_array_deg is a numpy array; and returns 0 when all wind directions are NaN if angles_array_deg is a dataframe.

Steps to recreate issue

Run the following code on the current develop branch and compare to this bugfix branch:

from flasc import circular_statistics as cs
import numpy as np
import pandas as pd

print(np.nansum(np.array([3, 2, np.nan]))) # Intuitive
print(np.nansum(np.array([np.nan, np.nan, np.nan]))) # Not particularly intuitive

# generate data centered around 0
np.random.seed(1)
X = 360 + np.random.randn(5, 3)*5

## Behavior for numpy arrays

print(X)
# Normal behavior
print(cs.calc_wd_mean_radial(X))
print(cs.calc_wd_mean_radial(X, 1))

# Behavior when NaNs present
X_nans = X.copy()
X_nans[:,1] = np.nan
X_nans[1,:] = np.nan

print(X_nans)
# All nans due to presence of at least one nan
print(cs.calc_wd_mean_radial(X_nans)) 
print(cs.calc_wd_mean_radial(X_nans, 1)) 

## Behavior when data is a pandas dataframe
df = pd.DataFrame(X)

print(df)
# Runs as expected when NaNs not present
print(cs.calc_wd_mean_radial(df)) 
print(cs.calc_wd_mean_radial(df, 1)) 

df_nans = pd.DataFrame(X_nans)
print(df_nans)
# Runs as expected when not all values are NaN, but produces zeros when all 
# values are NaN
print(cs.calc_wd_mean_radial(df_nans)) 
print(cs.calc_wd_mean_radial(df_nans, 1))

I could also turn the above (or something similar) into a unit test for calc_wd_mean_radial().

codecov-commenter · 2023-02-15T21:05:55Z

Codecov Report

Base: 33.18% // Head: 30.31% // Decreases project coverage by -2.87% ⚠️

Coverage data is based on head (b9a783d) compared to base (3794509).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #68      +/-   ##
===========================================
- Coverage    33.18%   30.31%   -2.87%     
===========================================
  Files           38       40       +2     
  Lines         3779     4262     +483     
===========================================
+ Hits          1254     1292      +38     
- Misses        2525     2970     +445

Impacted Files	Coverage Δ
flasc/circular_statistics.py	`97.56% <100.00%> (+0.12%)`	⬆️
flasc/energy_ratio/energy_ratio_suite.py	`8.74% <0.00%> (-1.46%)`	⬇️
...sc/energy_ratio/energy_ratio_wd_bias_estimation.py	`14.54% <0.00%> (-0.27%)`	⬇️
flasc/turbine_analysis/find_sensor_faults.py	`8.64% <0.00%> (-0.06%)`	⬇️
flasc/energy_ratio/energy_ratio_gain.py	`7.16% <0.00%> (ø)`
flasc/turbine_analysis/northing_offset.py	`9.72% <0.00%> (ø)`
flasc/turbine_analysis/ws_pow_filtering.py	`7.50% <0.00%> (+0.19%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Bartdoekemeijer · 2023-02-16T06:41:10Z

Nice find @misi9170! Perhaps we can add an option for NaN handling, similar to scipy.stats.circmean, defaulting to omit as you suggest? See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.circmean.html.

misi9170 · 2023-02-16T13:04:16Z

@Bartdoekemeijer thanks for the suggestion! I had been meaning to test it for speed against scipy.stats.circmean in any case. I found scipy.stats.circmean to be slightly faster than the current implementation. Moreover, setting nan_policy to 'omit' as you suggest provides the expected behavior (ignores NaNs when non-NaNs are present; returns NaN if all NaNs) for both numpy arrays and dataframes. I've therefore updated this function to simply wrap scipy.stats.circmean.

The default nan_policy for circmean is 'propagate', but I've set the default to 'omit' for calc_wd_mean_radial().

misi9170 · 2023-02-23T13:34:51Z

@paulf81 @Bartdoekemeijer OK to merge?

paulf81 · 2023-02-23T17:06:20Z

I was just about to say, but won't this be slower? But happy to hear it's actually faster! Good to go from my end

misi9170 · 2023-02-23T19:39:18Z

@paulf81 Thanks for calling that out---I was wrong. I went back and checked, and I had been comparing the times incorrectly. In fact, this proposed implementation (using scipy's circmean) is about 15% slower than the the current implementation. In my opinion, that's a reasonable price to pay for simplicity, but I'm also happy to revert to the explicit calculation.

For reference, on my laptop this operation takes about 3.5 seconds to calculate 1,000,000 means, each containing 100 points, which I think is still fast enough to not be a major hold up for dealing with SCADA records.

Bartdoekemeijer · 2023-02-24T08:01:12Z

15% slower for the sake of simplicity seems fine to me!

misi9170 · 2023-02-24T15:05:59Z

Great---squashing and merging.

paulf81 · 2023-02-24T20:51:37Z

Thanks guys, and I think agree, simplicity is a good goal. Also was thinking the more we have a practice of using tools versus reinventing probably then we'll also set ourselves up to benefit from improvements made to those tools, so probably in addition to simplicity, this is a good decision from the point of view of good practice, like, don't write any functions that exist already in numpy/scipy. I know this is already merged was just thinking of adding this point to maybe nudge future decisions in this same way. Thanks @misi9170 and @Bartdoekemeijer !

Bartdoekemeijer · 2023-02-27T08:25:19Z

Totally agree @paulf81! I think this ties in with #64 too. The more we can rely on OpenOA, the leaner this code becomes!

misha added 2 commits February 13, 2023 14:21

Checks to see if all values are NaNs before proceeding.

7fcc732

Mask for all-NaN rows/cols; replace values with NaNs prior to returning.

b9a783d

misi9170 requested review from Bartdoekemeijer and paulf81 February 15, 2023 21:02

Adding basic docstring.

a711146

Changing to simply a wrapper for scipy.stats.circmean

4ff5a19

paulf81 approved these changes Feb 23, 2023

View reviewed changes

Bartdoekemeijer approved these changes Feb 24, 2023

View reviewed changes

misi9170 merged commit 17da9bb into NREL:develop Feb 24, 2023

misi9170 deleted the bugfix/circ-mean-nans branch February 24, 2023 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NaN handling in calc_wd_mean_radial #68

Fix NaN handling in calc_wd_mean_radial #68

misi9170 commented Feb 15, 2023

codecov-commenter commented Feb 15, 2023

Bartdoekemeijer commented Feb 16, 2023

misi9170 commented Feb 16, 2023

misi9170 commented Feb 23, 2023

paulf81 commented Feb 23, 2023

misi9170 commented Feb 23, 2023 •

edited

Loading

Bartdoekemeijer commented Feb 24, 2023

misi9170 commented Feb 24, 2023

paulf81 commented Feb 24, 2023

Bartdoekemeijer commented Feb 27, 2023

Fix NaN handling in calc_wd_mean_radial #68

Fix NaN handling in calc_wd_mean_radial #68

Conversation

misi9170 commented Feb 15, 2023

Issue

Solution

Steps to recreate issue

codecov-commenter commented Feb 15, 2023

Codecov Report

Bartdoekemeijer commented Feb 16, 2023

misi9170 commented Feb 16, 2023

misi9170 commented Feb 23, 2023

paulf81 commented Feb 23, 2023

misi9170 commented Feb 23, 2023 • edited Loading

Bartdoekemeijer commented Feb 24, 2023

misi9170 commented Feb 24, 2023

paulf81 commented Feb 24, 2023

Bartdoekemeijer commented Feb 27, 2023

misi9170 commented Feb 23, 2023 •

edited

Loading