-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. #1918
Comments
In order to describe this slightly better, I discussed the issue with Johnna and she provided the following succinct summary:
I'm including a few snippets of Johnna's code, which is currently doing the standardizing process outside of MET in a Python script:
I think if this is placed anywhere, it makes the most sense to be in gen-ens-prod. This is for ensemble members and requires climatology data, and keeps within the scope of gen-ens-prod:
An argument could be made this would be useful in different tools (i.e. grid-stat), but unlike regridding that can apply to numerous situations, ensemble standardization is strictly for ensembles. Users wanting this function for non-ensemble work should be able to pass the files into gen-ens-prod in a mock-ensemble file list, and as long as the variable names are the same it should behave the same. |
@j-opatz this is great info! Thanks. Sounds like we need add an option to GenEnsProd to do the following:
We could consider adding a new GenEnsProd config file option:
Add a global attribute to the NetCDF output file indicating the normalization method applied. Update GenEnsProd to compute the ensemble mean/stdev, as needed. Once this is working well in GenEnsProd, consider adding options 3 and 4 for normalize_flag to Grid-Stat, Point-Stat, Series-Analysis, and Ensemble-Stat. However, if applying it to these tools, we'd need to normalize both the forecast and observation data. This work would be a separate issue. |
…d update the documentation.
Making progress. Added the normalize_flag option, updated config files, added documentation. Still need to:
|
@JohnHalleyGotway internal development notes:
|
…ore similar to the convert and censor_thresh/censor_val options.
…e names or attributes. Normalizing the input data is similar to converting it or censoring it and that information is not written to the NetCDF output files. The nc_var_str config option can be used to customize the output variable names as the user sees fit.
… in the vx_util library so that that functionality is available to other MET tools. ci-run-unit
After discussions with CPC, it's become more clear what steps need to taken to accomplish the goal of standardizing ensemble members to the ensemble climatology. Big thanks to Johnna for the clarification, including the majority of the description that follows. Starting with new capabilities 1) and 2) from above,
What's desired by CPC is the ability to calculate model anomaly and model standardized anomaly with respect to the temporal model climatology and temporal standard deviation. 1) and 2) were initially discussed as the ensemble mean, with no temporal aspect. Described in a more Pythonic way, CPC's data are organized such that they have 1 netCDF file per forecast initialization, with m amount of ensemble members in each file, and l amount of leads. For example, for CFSv2, a file with the notation 198201 is a forecast initialized January 1982, will include l=12 leads, and m=24 members. Each file would have an array of [lead, member, lat, lon]. We'd open these in a loop over all the available initializations. Loading t amount of files into an array in python gives a 4D array of raw_model_data [init, member, lat, lon]. There's no lead in this array, since the current use case usage focuses on 1 lead (lead 0). Loading in all initalizations from 1982-2010 at lead 0 the array will be raw_model_data[init | 29, member | 24, lat | 180, lon | 360]. Finally, to calculate the temporal model climatology, the average is taken over the "init" dimension for e.g. 1982-2010 (this can be any period). This will give an array of model_clim[member,lat,lon] that holds the average over 1982-2010 for each member and gridpoint. Likewise, to calculate the temporal model standard deviation, the standard deviation is calculated over the "init" dimension for e.g. 1982-2010 (this can be any period). This will give an array of model_std_dev[member,lat,lon] that holds the standard deviation over 1982-2010 for each member and gridpoint. Then it's a simply applying the necessary equation to gain the anomalies and standardized anomalies for each ensemble member: FCST_ANOM[member,lat,lon] = raw_model_data[init=t,member,lat,lon] - model_clim. FCST_STD_ANOM[member,lat,lon] = (raw_model_data[init=t,member,lat,lon] - model_clim)/model_std_dev. This was briefly discussed in a meeting yesterday, and I think @JohnHalleyGotway correctly surmised that this is actually a two-tool solution: the temporal model climatology and temporal model standard deviation for each ensemble member should be obtained via series-analysis, and the FCST_ANOM, FCST_STD_ANOM variables could be obtained via GenEnsProd. In this way, it's actually more in line with capabilities 3) and 4),
as the temporal model climatologies and standard deviations could be fed in via the climo_mean and climo_stdev libraries in the configuration file. When asked if there is value in keeping the ability to measure and calculate the ensemble mean and ensemble standard deviation as described in the original 1) and 2) capabilities, Johnna provided
If this functionality is already in place with the current work that's done, there's no reason to tear it back out. But if it requires additional work and code updates, I'm all in favor of dropping that functionality. |
@j-opatz testing revealed that additional logic is needed. One solution is leveraging the existing MET_ENS_MEMBER_ID keyword in the config file. Recommend enhancing the processing of climatology data so that any instances of MET_ENS_MEMBER_ID are replaced by the actual string for the current member. That string could appear in the VarInfo name or level strings but could also appear in the file_name array. The hope is that this change will simplify the application of Gen-Ens-Prod to NOAA/CPC evaluation of these ensembles. |
…s so that we can use it later, if needed, when reading climatological data which may also make use of that string.
…nt variable when reading climatology data if the ens_member_ids config option has been set and the normalizing relative to climatology has been requested.
…larify what data is being read from which climo data files.
…BER_ID to read climo data separately for each member.
…BER_ID to read climo data separately for each member.
* Per #1918, store the ensemble_member_id string in the EnsVarInfo class so that we can use it later, if needed, when reading climatological data which may also make use of that string. * Per #1918, update gen_ens_prod to set the MET_ENS_MEMBER_ID environment variable when reading climatology data if the ens_member_ids config option has been set and the normalizing relative to climatology has been requested. * Per #1918, add log messages to read_climo.cc and gen_ens_prod.cc to clarify what data is being read from which climo data files. * Added documentation on MET_ENS_MEMBER_ID usage in climo file name * updated usage langauge * Per #1918, adding gen_ens_prod unit test to demonstrate using ENS_MEMBER_ID to read climo data separately for each member. * Per #1918, adding gen_ens_prod unit test to demonstrate using ENS_MEMBER_ID to read climo data separately for each member. Co-authored-by: j-opatz <[email protected]>
Looking at the logic for EnsembleStat and GenEnsProd, I think you are right that the integer argument is not needed. I think it was needed previously while I was doing development, but the final solution does not actually require it. If ctrl_info is not set, then I believe we still need the field info of the first field instead of NULL because we need to pass that information to read the control field.
|
Describe the New Feature
Based on feedback from both the MetOffice and CPC, standardization of ensemble members relative to the ensemble's climatology mean and standard deviation is common practice. The most popular version of this is subtract mean/divide by stdev, but can also including subtract mean only. Others exist, but these seem the most desired after discussion with both offices.
An improvement to EnsembleStat that allows for a user-specified standardization, perhaps along the lines of a variable (i.e. normalize = CLIMO_MEAN), would benefit all offices that work with Ensemble climatology data.
Acceptance Testing
Best dataset to use for CPC would be NMME. Can be moved to more optimal location once development begins.
Because the standardizing of subtract mean/divide by stdev is currently being accomplished in METplus via Python Embedding, there are results to compare the improved functionality to. The subtract mean option does not have previous results currently.
Time Estimate
3 days
Sub-Issues
Consider breaking the new feature down into sub-issues.
No sub-issues needed
Relevant Deadlines
The reorg of EnsembleStat
Funding Source
2799991
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
New Feature Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s) and Linked issues
Select: Repository level development cycle Project for the next official release
Select: Milestone as the next official version
The text was updated successfully, but these errors were encountered: