Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. #1918

j-opatz · 2021-09-14T22:22:51Z

Describe the New Feature

Based on feedback from both the MetOffice and CPC, standardization of ensemble members relative to the ensemble's climatology mean and standard deviation is common practice. The most popular version of this is subtract mean/divide by stdev, but can also including subtract mean only. Others exist, but these seem the most desired after discussion with both offices.
An improvement to EnsembleStat that allows for a user-specified standardization, perhaps along the lines of a variable (i.e. normalize = CLIMO_MEAN), would benefit all offices that work with Ensemble climatology data.

Acceptance Testing

Best dataset to use for CPC would be NMME. Can be moved to more optimal location once development begins.
Because the standardizing of subtract mean/divide by stdev is currently being accomplished in METplus via Python Embedding, there are results to compare the improved functionality to. The subtract mean option does not have previous results currently.

Time Estimate

3 days

Sub-Issues

Consider breaking the new feature down into sub-issues.
No sub-issues needed

Relevant Deadlines

The reorg of EnsembleStat

Funding Source

2799991

Define the Metadata

Assignee

Select engineer(s) or no engineer required
Select scientist(s) or no scientist required

Labels

Select component(s)
Select priority
Select requestor(s)

Projects and Milestone

Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

METplus, MET, METdatadb, METviewer, METexpress, METcalcpy, METplotpy
Defined Add support for the normalize option to the Gen-Ens-Prod wrapper. METplus#1445 to add support for this to the gen_ens_prod wrapper

New Feature Checklist

See the METplus Workflow for details.

The text was updated successfully, but these errors were encountered:

j-opatz · 2022-02-10T18:29:35Z

In order to describe this slightly better, I discussed the issue with Johnna and she provided the following succinct summary:

fcst standardized anomalies are calculated in model space, e.g. fcst_std_anom = (fcst - fcst_clim)/fcst_sd

obs standardized anomalies are calculated in obs space, e.g. obs_std_anom = (obs-obs_clim)/obs_sd

I'm including a few snippets of Johnna's code, which is currently doing the standardizing process outside of MET in a Python script:

# Define climatology for the lead and member of interest
clim = np.nanmean(full_fcst_array,axis=0)

# Define standard deviation for the lead and member of interest
stddev = np.nanstd(full_fcst_array,axis=0)

# Define anomalies and standardized anomalies (perhaps unnecessary)
for y in range(len(years)):
    anom[y,:,:] = full_fcst_array[y,:,:] - clim
    std_anom[y,:,:] = anom[y,:,:]/stddev

return clim, stddev, anom, std_anom`

I think if this is placed anywhere, it makes the most sense to be in gen-ens-prod. This is for ensemble members and requires climatology data, and keeps within the scope of gen-ens-prod:

The Gen-Ens-Prod tool generates simple ensemble products (mean, spread, probability, etc) from gridded ensemble member input files. While it processes model inputs, but it does not compare them to observations or compute statistics.

An argument could be made this would be useful in different tools (i.e. grid-stat), but unlike regridding that can apply to numerous situations, ensemble standardization is strictly for ensembles. Users wanting this function for non-ensemble work should be able to pass the files into gen-ens-prod in a mock-ensemble file list, and as long as the variable names are the same it should behave the same.

JohnHalleyGotway · 2022-02-17T21:55:01Z

@j-opatz this is great info! Thanks.

Sounds like we need add an option to GenEnsProd to do the following:
For each ensemble member, be able to...

subtract off the ens mean (FCST_ANOM)
subtract off the ens mean and divide by the stdev (FCST_STD_ANOM)
subtract off the climo mean (CLIMO_ANOM)
subtract off the climo mean and divide by the stdev (CLIMO_STD_ANOM)

We could consider adding a new GenEnsProd config file option:

normalize_flag = NONE, FCST_ANOM, FCST_STD_ANOM, CLIMO_ANOM, CLIMO_STD_ANOM

Add a global attribute to the NetCDF output file indicating the normalization method applied.

Update GenEnsProd to compute the ensemble mean/stdev, as needed.
If required climo data isn't available, error out.

Once this is working well in GenEnsProd, consider adding options 3 and 4 for normalize_flag to Grid-Stat, Point-Stat, Series-Analysis, and Ensemble-Stat. However, if applying it to these tools, we'd need to normalize both the forecast and observation data. This work would be a separate issue.

…DataPlane.

…d update the documentation.

JohnHalleyGotway · 2022-02-19T01:33:15Z

Making progress. Added the normalize_flag option, updated config files, added documentation.

Still need to:

Add a new unit test to test normalizing in all supported ways.
Figure out how to differentiate between the different methods in the output... note that it can be normalized differently for each entry in the ens.field array.

JohnHalleyGotway · 2022-02-19T12:20:53Z

@JohnHalleyGotway internal development notes:

rename normalize_flag config file option to just be normalize to make it more similar to the censor and convert options
do not automatically include the normalization type in the gen_ens_prod output variable names
do not include normalization type as a NetCDF output variable attribute since we don't do that when the data's been converted or censored
update the new unit test to set nc_var_str to customize the output variable names
moved normalize_data() utility function over to data_plane_util.h/.cc since we'll likely want to call it from other MET tools and switch it to pass DataPlane pointers rather than references
update the docs accordingly
write up a new issue to extend the usage of normalize to ps, gs, es, and sa (all tools that read climo data)

…ore similar to the convert and censor_thresh/censor_val options.

…e names or attributes. Normalizing the input data is similar to converting it or censoring it and that information is not written to the NetCDF output files. The nc_var_str config option can be used to customize the output variable names as the user sees fit.

… in the vx_util library so that that functionality is available to other MET tools. ci-run-unit

j-opatz · 2022-02-23T18:46:39Z

After discussions with CPC, it's become more clear what steps need to taken to accomplish the goal of standardizing ensemble members to the ensemble climatology. Big thanks to Johnna for the clarification, including the majority of the description that follows.

Starting with new capabilities 1) and 2) from above,

subtract off the ens mean (FCST_ANOM)
subtract off the ens mean and divide by the stdev (FCST_STD_ANOM)

What's desired by CPC is the ability to calculate model anomaly and model standardized anomaly with respect to the temporal model climatology and temporal standard deviation. 1) and 2) were initially discussed as the ensemble mean, with no temporal aspect.

Described in a more Pythonic way, CPC's data are organized such that they have 1 netCDF file per forecast initialization, with m amount of ensemble members in each file, and l amount of leads. For example, for CFSv2, a file with the notation 198201 is a forecast initialized January 1982, will include l=12 leads, and m=24 members. Each file would have an array of [lead, member, lat, lon]. We'd open these in a loop over all the available initializations.

Loading t amount of files into an array in python gives a 4D array of raw_model_data [init, member, lat, lon]. There's no lead in this array, since the current use case usage focuses on 1 lead (lead 0). Loading in all initalizations from 1982-2010 at lead 0 the array will be raw_model_data[init | 29, member | 24, lat | 180, lon | 360].

Finally, to calculate the temporal model climatology, the average is taken over the "init" dimension for e.g. 1982-2010 (this can be any period). This will give an array of model_clim[member,lat,lon] that holds the average over 1982-2010 for each member and gridpoint.

Likewise, to calculate the temporal model standard deviation, the standard deviation is calculated over the "init" dimension for e.g. 1982-2010 (this can be any period). This will give an array of model_std_dev[member,lat,lon] that holds the standard deviation over 1982-2010 for each member and gridpoint.

Then it's a simply applying the necessary equation to gain the anomalies and standardized anomalies for each ensemble member:

FCST_ANOM[member,lat,lon] = raw_model_data[init=t,member,lat,lon] - model_clim.

FCST_STD_ANOM[member,lat,lon] = (raw_model_data[init=t,member,lat,lon] - model_clim)/model_std_dev.

This was briefly discussed in a meeting yesterday, and I think @JohnHalleyGotway correctly surmised that this is actually a two-tool solution: the temporal model climatology and temporal model standard deviation for each ensemble member should be obtained via series-analysis, and the FCST_ANOM, FCST_STD_ANOM variables could be obtained via GenEnsProd. In this way, it's actually more in line with capabilities 3) and 4),

subtract off the climo mean (CLIMO_ANOM)
subtract off the climo mean and divide by the stdev (CLIMO_STD_ANOM)

as the temporal model climatologies and standard deviations could be fed in via the climo_mean and climo_stdev libraries in the configuration file.

When asked if there is value in keeping the ability to measure and calculate the ensemble mean and ensemble standard deviation as described in the original 1) and 2) capabilities, Johnna provided

Its still a useful metric. For example, one could see how much the members deviate from the mean (for example, say your ensemble mean says the temperature fcst is 50C at a gridpoint, you could see how much the ensemble members range about that value). It would be a good contribution, though not necessarily what we're looking for in this particular instance.

If this functionality is already in place with the current work that's done, there's no reason to tear it back out. But if it requires additional work and code updates, I'm all in favor of dropping that functionality.

JohnHalleyGotway · 2022-02-25T18:12:10Z

@j-opatz testing revealed that additional logic is needed.
CPC processes the CFSV2 24-member ensemble (and also the NMME 120-member ensemble). Each of these ensemble members must be normalized relative to a 30 year average of that INDIVIDUAL MEMBER. In Gen-Ens-Prod the climo_mean and climo_stdev dictionaries provide climo data that is applied in the same way to all ensemble member inputs. However, what we need here is to define climo data separately for each member.

One solution is leveraging the existing MET_ENS_MEMBER_ID keyword in the config file. Recommend enhancing the processing of climatology data so that any instances of MET_ENS_MEMBER_ID are replaced by the actual string for the current member. That string could appear in the VarInfo name or level strings but could also appear in the file_name array.

The hope is that this change will simplify the application of Gen-Ens-Prod to NOAA/CPC evaluation of these ensembles.

…s so that we can use it later, if needed, when reading climatological data which may also make use of that string.

…nt variable when reading climatology data if the ens_member_ids config option has been set and the normalizing relative to climatology has been requested.

…larify what data is being read from which climo data files.

…BER_ID to read climo data separately for each member.

* Per #1918, store the ensemble_member_id string in the EnsVarInfo class so that we can use it later, if needed, when reading climatological data which may also make use of that string. * Per #1918, update gen_ens_prod to set the MET_ENS_MEMBER_ID environment variable when reading climatology data if the ens_member_ids config option has been set and the normalizing relative to climatology has been requested. * Per #1918, add log messages to read_climo.cc and gen_ens_prod.cc to clarify what data is being read from which climo data files. * Added documentation on MET_ENS_MEMBER_ID usage in climo file name * updated usage langauge * Per #1918, adding gen_ens_prod unit test to demonstrate using ENS_MEMBER_ID to read climo data separately for each member. * Per #1918, adding gen_ens_prod unit test to demonstrate using ENS_MEMBER_ID to read climo data separately for each member. Co-authored-by: j-opatz <[email protected]>

georgemccabe · 2022-03-08T19:53:04Z

Looking at the logic for EnsembleStat and GenEnsProd, I think you are right that the integer argument is not needed. I think it was needed previously while I was doing development, but the final solution does not actually require it.

If ctrl_info is not set, then I believe we still need the field info of the first field instead of NULL because we need to pass that information to read the control field.

@georgemccabe, while doing development for this feature, I got confused by the usage of 'ens_info->get_ctrl(int)'.

An existing call to EnsVarInfo::get_ctrl(int) can be seen on this line. And a new call that I added can be seen on this line.

I'm confused about the integer argument to that function. Here's the definition of it.
1. If a control file has been specified on the command line, presumably ctrl_info is set, and that VarInfo is returned.

2. If ctrl_info is not set, then we return the VarInfo from the ensemble input corresponding to that index.
Looking at how "get_ctrl(i_ens)" is called in ensemble_stat.cc and gen_ens_prod.cc... it's only called when reset is true, which is really only when i_ens = 0. So we'd always be using the VarInfo from the first ensemble input.

I'm wondering if EnsVarInfo::get_ctrl(int) should NOT have an integer argument? If ctrl_info is set, return it, and if not, just return NULL.

Or maybe I don't understand the logic here?

j-opatz added this to the MET 10.1.0 milestone Sep 14, 2021

j-opatz assigned JohnHalleyGotway and j-opatz Sep 14, 2021

JohnHalleyGotway added the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Sep 23, 2021

TaraJensen removed the alert: NEED ACCOUNT KEY Need to assign an account key to this issue label Feb 17, 2022

JohnHalleyGotway changed the title ~~Standardize ensemble members relative to climatology~~ Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology Feb 17, 2022

JohnHalleyGotway removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Feb 17, 2022

JohnHalleyGotway added a commit that referenced this issue Feb 18, 2022

Per #1918, add nxy(), anomaly(), and standard_anomaly() functions to …

e72368f

…DataPlane.

JohnHalleyGotway added a commit that referenced this issue Feb 18, 2022

Per #1918, add normalize_flag option to gen_ens_prod.

0e658a4

JohnHalleyGotway added a commit that referenced this issue Feb 19, 2022

Per #1918, add the normalize_flag to other GenEnsProd config files an…

2b1d854

…d update the documentation.

JohnHalleyGotway added a commit that referenced this issue Feb 19, 2022

Per #1918, rename normalize_flag to just normalize to make its name m…

58cbb8f

…ore similar to the convert and censor_thresh/censor_val options.

JohnHalleyGotway added a commit that referenced this issue Feb 20, 2022

Per #1918, move normalize_data() from gen_ens_prod to normalize.h/.cc…

43048e3

… in the vx_util library so that that functionality is available to other MET tools. ci-run-unit

JohnHalleyGotway linked a pull request Feb 20, 2022 that will close this issue

Feature 1918 std climo #2061

Merged

14 tasks

This was referenced Feb 20, 2022

Add support for the normalize option to the Gen-Ens-Prod wrapper. dtcenter/METplus#1445

Closed

Add support for the normalize option to all MET tools that use climatology data. #2062

Open

j-opatz changed the title ~~Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology~~ Enhance Gen-Ens-Prod to standardize ensemble members relative to ensemble climatology Feb 23, 2022

JohnHalleyGotway closed this as completed Feb 23, 2022

JohnHalleyGotway mentioned this issue Feb 24, 2022

Update develop-ref after #2061 #2069

Merged

JohnHalleyGotway reopened this Feb 25, 2022

JohnHalleyGotway added a commit that referenced this issue Feb 25, 2022

Per #1918, store the ensemble_member_id string in the EnsVarInfo clas…

20d9404

…s so that we can use it later, if needed, when reading climatological data which may also make use of that string.

JohnHalleyGotway linked a pull request Feb 25, 2022 that will close this issue

Feature 1918 climo_ens_member_id #2075

Merged

15 tasks

JohnHalleyGotway added a commit that referenced this issue Mar 1, 2022

Per #1918, add log messages to read_climo.cc and gen_ens_prod.cc to c…

a2bb93d

…larify what data is being read from which climo data files.

JohnHalleyGotway added a commit that referenced this issue Mar 2, 2022

Per #1918, adding gen_ens_prod unit test to demonstrate using ENS_MEM…

308b461

…BER_ID to read climo data separately for each member.

JohnHalleyGotway added a commit that referenced this issue Mar 2, 2022

Per #1918, adding gen_ens_prod unit test to demonstrate using ENS_MEM…

a397ae0

…BER_ID to read climo data separately for each member.

JohnHalleyGotway closed this as completed Mar 2, 2022

JohnHalleyGotway mentioned this issue Mar 2, 2022

Update develop-ref after #2075 #2076

Merged

JohnHalleyGotway changed the title ~~Enhance Gen-Ens-Prod to standardize ensemble members relative to ensemble climatology~~ Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. Mar 2, 2022

JohnHalleyGotway removed a link to a pull request Mar 2, 2022

Feature 1918 climo_ens_member_id #2075

Merged

15 tasks

JohnHalleyGotway linked a pull request Mar 2, 2022 that will close this issue

Update develop-ref after #2075 #2076

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. #1918

Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. #1918

j-opatz commented Sep 14, 2021 •

edited by JohnHalleyGotway

Loading

j-opatz commented Feb 10, 2022

JohnHalleyGotway commented Feb 17, 2022

JohnHalleyGotway commented Feb 19, 2022

JohnHalleyGotway commented Feb 19, 2022 •

edited

Loading

j-opatz commented Feb 23, 2022

JohnHalleyGotway commented Feb 25, 2022 •

edited

Loading

georgemccabe commented Mar 8, 2022

Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. #1918

Enhance Gen-Ens-Prod to standardize ensemble members relative to climatology. #1918

Comments

j-opatz commented Sep 14, 2021 • edited by JohnHalleyGotway Loading

Describe the New Feature

Acceptance Testing

Time Estimate

Sub-Issues

Relevant Deadlines

Funding Source

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

New Feature Checklist

j-opatz commented Feb 10, 2022

JohnHalleyGotway commented Feb 17, 2022

JohnHalleyGotway commented Feb 19, 2022

JohnHalleyGotway commented Feb 19, 2022 • edited Loading

j-opatz commented Feb 23, 2022

JohnHalleyGotway commented Feb 25, 2022 • edited Loading

georgemccabe commented Mar 8, 2022

j-opatz commented Sep 14, 2021 •

edited by JohnHalleyGotway

Loading

JohnHalleyGotway commented Feb 19, 2022 •

edited

Loading

JohnHalleyGotway commented Feb 25, 2022 •

edited

Loading