-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement expected analysis #236
base: develop
Are you sure you want to change the base?
Conversation
@paulf81, I was going to add a test to make sure the uplift variance is correct using First, if setting The other issue is when Also, let me know if I'm wrong about any of the above. I didn't get a chance to thoroughly check this behavior yet. |
test_cols=test_cols, | ||
bin_cols_without_df_name=bin_cols_without_df_name, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this could be a good place to call a new utility function that synchronizes nulls between df_cov and df_bin (something like "_synchronize_mean_power_cov_nulls"). Specifically, for each row, if there are any turbines in df_cov with undefined variances or covariances (because count < 2), then the mean power for those turbines would get set to Null as well. This way, we would always be able to return a standard error for the uplift by excluding turbines in a given bin with undefined covariance from both the expected uplift and uncertainty calculations. I like this because we wouldn't end up returning a NaN uncertainty value when calling this function. In most cases, though, we would probably just want to set variance_only = True
or fill_cov_with_var = True
to maximize the number of turbines that can be used to compute the uplift in each bin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I added _synchronize_mean_power_cov_nulls to the utilities and test_synchronize_mean_power_cov_nulls to the tests and maybe check the test first but I think it's doing what you describe. I agree this seems to the right approach! Just had a little time so started here, and now the tests of the full function fail but I think that's right. But let me know if this matches what you expected and then I'll got to your later comments and fix the overall result to match
|
||
# If any of the cov_cols are null, set pow_farm_var to null | ||
df_bin = df_bin.with_columns( | ||
pl.when(pl.all_horizontal([pl.col(c).is_not_null() for c in cov_cols])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connected to the comments above, I think we could get rid of this requirement that all cov_cols are defined. When computing the "pow_farm_var" column a few lines above, if my understanding is correct, the summation over cov_cols will just ignore any Null values. So, we'd be calculating the variance of the farm power considering only the turbines that are valid, in the same way that "pow_farm" will only sum the power of the turbines that are not Null. Therefore, the farm power variance will correspond to the same set of turbiens used to compute farm power.
if remove_any_null_turbine_bins: | ||
df_bin = df_bin.filter( | ||
pl.all_horizontal([pl.col(f"{c}_mean").is_not_null() for c in test_cols]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connected to the comment above, at this point, regardless of the value of "remove_any_null_turbine_bins," we might need to remove rows where all test_cols are Null. Although rare, I think it is possible that if count < 2 for all turbines in a bin, then after synchronizing Nulls between df_cov and df_bin, we could be left with rows that are all Null that should get filtered out from the analysis.
Implement expected analysis
Feature or improvement description
This branch will implement the methods of uplift analysis described in AWC validation methodology. It also makes some changes to names of existing functions to better clarify the operation of the seperate methods for analyzing data and quantifying uplift.
Changes to be included:
EnergyRatioInput
toAnalysisInput
to make clear can be used for bothEnergyRatioInput
with deprecation warning to avoid breaking changesComputeTotalUplift
toTotalUpliftPowerRatio
to clarify the method of analysisexpected_power_analyisis.py
which will implement the methods in AWC validation methodologyexpected_power_analysis_output.py
for catching the output of functions inexpected_power_analysis
and provided post-processing operations