-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Ensemble-Stat to apply the HiRA method to ensembles #1583
Comments
…d data is present in the ensemble mean.
… dictionary rather than ens.field, which fails when ens.field is an empty array.
…nsemble members when creating the output txt files... esp when HiRA is not selected. ci-run-unit
This is now working. Still need to:
Two considerations noted during development:
|
…HiRA as an interpolation method for Ensemble-Stat. Having the entire HiRA dictionary in Ensemble-Stat brings unecessary complications when the only part that we actually want/need is the size and shape. This simpler solution requires less documentation updates and should have not upstream impacts on METplus.
… simple interpolation option.
…t. This is also run by unit_met_test_scripts.xml. Also, update the downstream STAT-Analysis job to produce output -by INTERP_MTHD,INTERP_PNTS.
@mpm-meto I have a followup question for you about HiRA in Ensemble-Stat. @venitahagerty, @TatianaBurek, and I were discussing loading the Ensemble-Stat HiRA output into METdatadb. @venitahagerty was surprised to see an output RHIST line containing 151 ranks. That's the result of running HiRA with a 5x5 neighborhood on an ensemble of size 6 (6 members * 25 neighborhood points + 1 rank = 151). My question is this. Should we constrain what output line types are written when the interpolation method is set to HiRA in Ensemble-Stat? For example, we could consider skipping the variable length lines, like RELP, RHIST, and ORANK. Line types with a fixed number of columns (like ECNT, RPS, PHIST, and SSVAR) aren't problematic. But the ones that depend on the number of ensemble "members" can get VERY long VERY fast. Should we skip RELP, RHIST, and ORANK when interp.method = HIRA? Or should we just let the user configure it in whatever way they'd like? |
@JohnHalleyGotway "But the ones that depend on the number of ensemble "members" can get VERY long VERY fast.".... yes we know that! We currently DO write these out, because of the way our current system works. We also use these values to compute the CRPS and its components, for example. Our output file format is binary however. Even so the files become VERY large and with the output frequency they are a pain. With the 11 x 11 neighbourhoods (the largest neighbourhood used for MOGREPS-UK), which runs hourly with 18 members.... therefore there are 2178 values for each observing location every hour, out to t+120h! In terms of utility, as the code for deterministic and ensemble is the same in our system at the moment, we have used the rank histograms for the deterministic forecasts. So I have some questions of my own:
Given the urgency I would suggest that these are just set to default OFF for both deterministic and ensembles for now. Switching them on should come with a huge health warning in the documentation too. Though, if this "breaks" the metdatadb then skipping them will be the only option at this point! We do need to revisit this in terms of how useful these line types are more generally and whether we can just switch them off for good. But, I can't make that judgement call at short notice. It requires a little bit more thought. @venitahagerty @TatianaBurek |
@mpm-meto thanks for the feedback. I was looking for your confirmation that the MetOffice does indeed use the HiRA derived ranked histograms. I wasn't sure if that was the case or if you only used CRPS from the ECNT line type, for example. So I'll make no changes to the software. If you request RHIST output and enable HiRA, then you'll get what you get. Two details do come to mind:
@venitahagerty and @TatianaBurek we'll just need to figure out how to handle outputs with 2000+ columns. |
The GitHub action testing workflow for METplus flagged differences in the use case output from MET. On 3/8/22, @georgemccabe and I examined those differences and determined that 2 of the 3 of them are the direct and expected result of recent development and fixes in MET. However, the 3rd difference in the output from Ensemble-Stat was unexpected and likely indicates insufficient logic in the changes for this issue. Specifically, this line of ens_stats.cc likely needs to change:
The Ensemble-Stat differences appeared in the WEST and CONUS masking regions, but not the EAST and LMV regions. Here's the TMP/Z2 ensemble mean field produced in this case: Since the CONUS and EAST regions contain missing data values, that "skip_mean" flag was set to true, and therefore all ensemble mean-realted outputs, like ME, RMSE, and SSVAR are missing from the output of this run. Need to refine the skip_mean logic so that it only applies to HiRA outputs. |
…sed statistics and SSVAR output. Instead of skipping for ANY instance of bad data in the ensemble mean, only skip when the ensemble mean is entirely bad data.
Describe the New Feature
The MET Office will need the computation of the HiRA method applied to ensembles.
This task is to enhance Ensemble-Stat to incorporate HiRA methodology into Ensemble-Stat.
In Point-Stat, the HiRA logic is controlled by the "hira" dictionary in the config file. It defines a whole separate set of processing logic and write ensemble output statistics and line types.
The incorporation in Ensemble-Stat should be much simpler. Ensemble-Stat already writes ensemble output statistics and line types. The only wrinkle added by HiRA is providing more flexibility as to which ensemble member values are used. That logic is currently controlled by the "interp" dictionary. Output statistics are generated for each interpolation type defined in the config file. Right now, all of these methods compute a SINGLE forecast value for each point observation value (e.g. the NEAREST, MIN, MAX, UW_MEAN, or DW_MEAN). Instead of computing this summary over nearby forecast grid points, we just want to use ALL of the nearby grid point values. That increases the ensemble size from N ensemble members to N members * M points in the HiRA neighborhood.
Rather than adding a "hira" dictionary to the Ensemble-Stat config file, recommend supporting a new "interp.type.method" option. See the existing supported methods listed here. Plan to name the new option "HIRA" to be specific about its intended use. But that name could be something else, like "ALL" for "all grid points in the neighborhood".
Most of the code changes to support this should lie in the PairDataEnsemble class, although there will likely be changes to the Ensemble-Stat application code. Also, add checks to make sure that "method = HIRA" is ONLY supported by Ensemble-Stat. All other tools should error out if the method is requested.
Note that Ensemble-Stat can be run with both point and gridded observations. If interp.type.method = HIRA, compute output when verifying against POINT observations, but do not compute output when verifying against GRIDDED analyses. Just log a message stating the HIRA only applies to points and the gridded vx step is being skipped for that interpolation method.
The config entry would look something like this:
@mpm-meto please review this logic and advise.
Acceptance Testing
Time Estimate
Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.
Sub-Issues
Consider breaking the new feature down into sub-issues.
Relevant Deadlines
Jun 2021
Funding Source
2799991 Met Office
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
New Feature Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s), Project(s), Milestone, and Linked issues
The text was updated successfully, but these errors were encountered: