-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in the computation of the climatological CDF value. #1638
Comments
Good news, the bug was easy to find... passing function arguments in the wrong order in Grid-Stat:
Bad news, this won't have any impact on the computation of Brier score and BSS that @j-opatz is investigating. The bug is confined to the contents of the NetCDF matched pairs Grid-Stat output file but those values aren't actually used in any of the statistics computations. |
@j-opatz and @bgbrowntollerud please take a look at the following code: MET/met/src/libcode/vx_statistics/pair_base.cc Line 1029 in 76253a9
I think this logic is not sufficient. This code means that thresholds of >=CDP67 and <=CDP67 would both result in a constant climo probability of 0.67. I do think it should be a constant climo probability, but it should depend on the threshold type. Am I correct in thinking the following? >CDP{N} should be a constant climo probability of 1.0 - N/100 (e.g. >CDP67 has 0.33 probability). Mathematically, the probability of being exactly equal to a number is 0 while being not-equal is 1.0. But in practice, that isn't very useful. Should I instead error out if the user specifies CDP thresholds which are not >, >=, <, or <=? Here's the updated logic:
|
…s, the constant climo probability should be based on the inequality type where less-than-types match the threshold percentile value while greater-than-types are 1.0 minus the threshold percentile.
All of the units test ran fine but they did produce differences with the regression test output, as expected:
The changes to the RPS output do make sense but the timestamps in the NetCDF files don't really:
Need to debug these timestamps further. |
…ng the climo mean field instead of the observation data field. This makes the timestamps consistent for the climo mean, stdev, and cdf variables in the Grid-Stat NetCDF matched pairs output file.
Found/fixed the time differences in the Grid-Stat NetCDF matched pairs file for climo CDF variables. Instead of initializing the output variables with the observation field, using the climo mean field instead. That results in the output time stamps being consistent for the climo mean, standard deviation, and cdf fields... which is desirable. |
* Per #1638, correct the order of arguments in the call to the normal_cdf() utility function. * Per #1638, update the logic in derive_climo_prob(). For CDP thresholds, the constant climo probability should be based on the inequality type where less-than-types match the threshold percentile value while greater-than-types are 1.0 minus the threshold percentile. * Per #1638, update normal_cdf() to initialize the output CDF field using the climo mean field instead of the observation data field. This makes the timestamps consistent for the climo mean, stdev, and cdf variables in the Grid-Stat NetCDF matched pairs output file.
…ded in the next bugfix release for met-9.1.
…ded in the next bugfix release for met-9.1. (#1640)
* Getting rid of compiler warnings in PB2NC by replacing several instances of the NULL pointer with the nul character (\0) instead. * Fix typo in config_options.rst. * Feature 1408 var_name_for_grib_code (#1617) * #1408 Added get_var_id * #1408 Check variable name in the configuration to use the variable name instewad of grib code * #1408 Added point2grid_ascii2nc_surfrad_DW_PSP_by_name * Feature 1580 2d time (#1616) * #1580 Added get_grid_from_lat_lon_vars * #1580 Added get_grid_from_lat_lon_vars and support 2D time variable * #1580 Support int type variable without scale_factor and add_offset attributes * #1580 Support 2D time variable. Implemented filtering by valid_time * #1580 Bug fix: read time with dimension 0 * #1580 Support time variable with no dimension * #1580 Initial release * #1580 Added point2grid_2D_time * #1580 Check project attribute for GOES * #1580 Changed NULL to 0 to avoid co,pilation warning * #1580 Added point2grid_2D_time * #1580 Added "point2grid configuration file" section * #1580 Changed to_grid for point2grid_NCCF_UK & point2grid_2D_time Co-authored-by: Howard Soh <[email protected]> Co-authored-by: John Halley Gotway <[email protected]> * feature 1580 nccf (#1619) * #1580 Correct the precision at _apply_scale_factor * #1580 Added unit test plot_data_plane_NCCF_time * #1580 Changed argument type to double at _apply_scale_factor(double) * Bugfix 1618 develop pb2nc (#1623) Co-authored-by: Howard Soh <[email protected]> * Feature 1624 OBS_COMMAND (#1625) * Per #1627, add grid_data.regrid config option for PlotPointObs and update the tool to do the requested regridding. Still need to update the docs. * Per #1627, update docs about grid_data.regrid config option for PlotPointObs. * Per #1627, add another call to plot_point_obs to exercise the new regrid functionality. * Feature 1624 obs_command second try (#1629) * Per #1624, define OBS_COMMAND. * Per #1624, unset the test-specific environment variables after completing the run. * Per #1624, after PR #1625 merged these changes into develop, they caused 2 unexpected diffs in the NB output. These were caused by enviornment variables being unset after each test. Updating unit_netcdf.xml and unit_point2grid.xml to define more test-specific environment variables to reproduce previous NB output. * Organizing NB climatology and point2grid output files into the appopriate directories rather than having them at the top-level directory. * Update pull_request_template.md * Update the point2grid unit tests to write their temp files to the point2grid subdirectory instead of the top-level test output directory. * Update appendixC.rst Split the definition of H_RATE and POD * Feature 1626 tc_gen (#1633) * Per #1448, many changes for TC-Gen. Replace the oper_genesis dictionary with the oper_technique string. Add genesis_init_diff config entry. Update config_constants.h accordingly and the tc_gen_conf_info.h/.cc to parse the updated config entries. * Per #1448, large overhaul of the tc_gen matching logic. This work is not yet complete. Still need to compute categorical MISSES but the current version does compile. * Per #1448, add GenesisInfoArray::has_storm_id() function and remove the unused set_dland() function. * Per #1448, more updates. Define the best genesis events while parsing the best tracks. We need to know the best genesis events in order to count up the forecast misses. * Per #1448, lots more changes for tc_gen. Create a PairDataGenesis class to store genesis pairs. This will be needed to write a matched pair line type. * Per #1448, minor tweaks to log messages. * Per #1448, update PairDataGenesis class to store the BEST track Storm ID since the forecast genesis do not have meaningful Storm ID's. * Per #1448, in GenesisInfoArray::add(), do NOT store multiple genesis events for the same storm, but do print a useful Debug(3) log message about it. * Per #1448, update PairDataGenesis::has_case() logic to check the storm id and initialization time but NOT require an exact forecast hour match. * Per #1448, update the tc_gen log messages to more concisely and consistently report the storm id. * Per #1448, update the PairDataGenesis logic a bit to have all the misses and hits in chronological order. * Per #1448, add genesis_init_diff entry. * Per #1448, set the default genesis_init_diff entry to 48 hours since that's what Dan H used in his examples. * Per #1448, work on comments and log messages. * Per #1448, reimplement TrackInfoArray as a vector instead of managing the memory myself. This makes the implmentation of TrackInfoArray::erase_storm_id() very easy. Replace n_tracks() function with n() in several places. * Per #1448, add valid_freq and basin_file config entries. Also rename load_dland.h/.cc to load_tc_data.h/.cc and add code to read the basin file. * Per #1448, add GenesisInfoArray::erase_storm_id(). * Per #1448, update tc_gen code to handle new config options. * Per #1448, had my units wrong. Was processing seconds when I thought it was hours! * Per #1448, making test TC-Gen config file consistent with the default. * Per #1448, also track the obs valid times. * Per #1448, switch from tech1/tech2 to dev/ops methods. Update log messages and add lots of details to the tc_gen documentation. * Per #1430, in tc_gen enable dev_method_flag, ops_method_flag, ci_alpha, and output_flag to be specified separately for each filter. Also add nc_pairs_flag and genesis_track_points_window config options. Add config constants entries for these options and update tc_gen to handle all of these changes. * Per #1430, consolidate the parse_grid_mask() code a bit to avoid redundancy.: * Per #1430, just cleaning up some messy comments. * Per #1430, adding hooks for writing NetCDF output file. * Per #1430, update DataPlane::set_size() function to take a 3rd argument to specify how the DataPlane should be initialized. * Per #1430, update the nc_pairs_flag options and update the code to parse them. * Per #1430, update the TrackInfo class to track and report the min/max warm core information. * Per #1430, current state of development. Still a work in progress. I'm getting runtime segfaults when testing and I still need to NOT overcount the BEST track hits. * Per #1430, committing changes described by #1430 (comment) * Per #1430, forgot to rename genesis_match_window to genesis_hit_window as it is in the code. * Per #1430, chaning GenesisInfo to just inherit directly from TrackInfo. Frankly, I should have thought of this a LONG time ago. * Per #1430, change the default desc setting from NA to ALL and add the best_unique_flag option. * Per #1430, simplify the logic now that GenesisInfo is derived from TrackInfo. Also support the best_unique_flag config option. * Per #1430, instead of storing 12 individual DataPlane objects, store them in a map to make writing their output more convenient. * Per #1430, updating documentation and comments. * Per #1430, more doc updates. * Per #1430, update unit test to only write NetCDF counts for the AL_BASIN and not the other filters. * Per #1430, fix parsing logic for nc_pairs_flag = TRUE. * Per #1430, fix bug. Check the VxOpt.NcInfo before calling write_nc(), not the top-level one. * Per #1430, the docker build of tc_gen failed. * Per #1430, working on DockerHub compilation. * Per #1430, getting DockerHub build working. * One more try. * Per #1597, add hooks for new GENMPR stat line type. * Per #1597, add config file option and column definitions for the GENMPR line type. * Per #1597, finish writing the GENMPR line type. * Per #1597, change the default output grid from a global 5 degree to global 1 degree grid. * Per #1597, change GENMPR output columns to GEN_TDIFF and INIT_TDIFF since they're reported in HHMMSS format instead of seconds. Also, tweak the config file for the tc-gen unit test. * Per #1597, have to add GENMPR header columns for Stat-Analysis and test scripts to handle it. * Per #1597, update Stat-Analysis to handle the GENMPR line type. * Per #1597, user's guide updates for the GENMPR and NetCDF output file. * Per #1597, add AGEN_INIT and AGEN_FHR columns. * Per #1597, add AGEN_INIT and AGEN_FHR columns. * Per #1597, remove the AGEN_TIME and BGEN_TIME columns from the GENMPR line type and instead write the genesis times to the FCST_VALID_BEG/END and OBS_VALID_BEG/END header columns. * Remove some unused output column name definitions. There are a remnant from very early versions of MET which included the CTP, CFP, and COP line types. * Per #1597, update config file options to use dev_hit_radius, dev_hit_window, and opt_hit_tdiff. Also update log message to switch from 'lead' to 'forecast hour'. * Per #1626, add met_regrid_nearest() utility function since I'm calling it twice. * Per #1626, update the basin_global_tenth_degree.nc basin definition file to include basin name abbreviations. * Per #1626, update load_tc_data.h/.cc to also read the basin abbreviations from the NetCDF basin file. * Per #1626, add TC-Gen config file options for init_inc, init_exc, and basin_mask. Updated the library and application code, and updated the user's guide. * Fixing Fortify warnings for 'Poor Style: Variable Never Used' in 6 files. * Fix Fortify warnings for 'Uninitialized variable' in tc_gen.cc and point2grid.cc. * Fix Fortify warnings for 'Poor Style: Redundant Initialization' in plot_point_obs.cc and point2grid.cc. * Feature 1346 valid time attr (#1634) * #1346 get_att_value_unixtime supports yyyymmdd_hhmmss, too * #1346 Check valid_time & init_time attributes, too * #1346 Check valid_time & init_time attributes, too Co-authored-by: Howard Soh <[email protected]> * Feature 1473 python errors (#1615) * Added sample script to read ascii data and create an xarray. * Disabled use_xarray exit for testing. * Get attrs from DataArray if using xarray. * Removed some comments. * Revised error messages for use with both numpy and xarray. * Removing commented out code. Co-authored-by: David Fillmore <[email protected]> Co-authored-by: johnhg <[email protected]> * Feature 1630 zero obs (#1637) * Per #1630, update ascii2nc to change zero observations from an error (which returns bad status) to a warning message. * Per #1630, update point2grid to read an empty input file and write fields of 0's or bad data to the output. Change previous error message to warning. Also, update LOTS of warning and error log messages to make them consistent. * Per #1630, need to initialize the dataplanes before the loop (for when there are no obs) and within each loop iteration (for when there are multiple fields to process). * Bugfix 1638 develop climo cdf (#1639) * Per #1638, correct the order of arguments in the call to the normal_cdf() utility function. * Per #1638, update the logic in derive_climo_prob(). For CDP thresholds, the constant climo probability should be based on the inequality type where less-than-types match the threshold percentile value while greater-than-types are 1.0 minus the threshold percentile. * Per #1638, update normal_cdf() to initialize the output CDF field using the climo mean field instead of the observation data field. This makes the timestamps consistent for the climo mean, stdev, and cdf variables in the Grid-Stat NetCDF matched pairs output file. * Update tc_gen.cc Co-authored-by: hsoh-u <[email protected]> Co-authored-by: Howard Soh <[email protected]> Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: j-opatz <[email protected]> Co-authored-by: David Fillmore <[email protected]> Co-authored-by: David Fillmore <[email protected]>
* Add debug level 4 message to list out the number of GRIB2 records inventoried. This helps debugging issues with MET potentially not reading all input GRIB2 records on WCOSS. * Bugfix 1554 main_v9.1 ncdump (#1555) * Bugfix 1562 main_v9.1 grid_diag (#1563) * Per #1562, check for bad data values before adding data to the PDF's for grid_diag. * Per #1562, removing the poly = CONUS.poly mask from GridDiagConfig_TMP. That settting masked a problem in the handling of missing data. Exercising the mask.poly option is tested in another unit test. This will change the output and break the nightly build, but that's good since we'll do more thorough testing. * Per #1508, change the verbosity in unit_tc_gen.xml from -v 2 to -v 5 to print out some additional log messages that may help in debugging the intermittent file list failure. * Feature 1572 v9.1.1 (#1573) * Per #1572, delete the docs/version file as it is not needed here. * Per #1572, update the version number to 9.1.1. * Per #1572, list the met-9.1.1 release date as 20201119 for the docs. * Per #1572, add release notes for the met-9.1.1 verison. * Per #1572, add release notes for met-9.1.1 version. * Per #1572, let's try to release met-9.1.1 today 11/18. * Correct GitHub link. * Fix small typo in release notes. * Bugfix 1618 main v91 pb2nc (#1622) Co-authored-by: Howard Soh <[email protected]> * Update pull_request_template.md * Per #1638, apply the same 3 fixes to the main_v9.1 branch to be included in the next bugfix release for met-9.1. (#1640) Co-authored-by: hsoh-u <[email protected]> Co-authored-by: Howard Soh <[email protected]>
* Add debug level 4 message to list out the number of GRIB2 records inventoried. This helps debugging issues with MET potentially not reading all input GRIB2 records on WCOSS. * Bugfix 1554 main_v9.1 ncdump (#1555) * Bugfix 1562 main_v9.1 grid_diag (#1563) * Per #1562, check for bad data values before adding data to the PDF's for grid_diag. * Per #1562, removing the poly = CONUS.poly mask from GridDiagConfig_TMP. That settting masked a problem in the handling of missing data. Exercising the mask.poly option is tested in another unit test. This will change the output and break the nightly build, but that's good since we'll do more thorough testing. * Per #1508, change the verbosity in unit_tc_gen.xml from -v 2 to -v 5 to print out some additional log messages that may help in debugging the intermittent file list failure. * Feature 1572 v9.1.1 (#1573) * Per #1572, delete the docs/version file as it is not needed here. * Per #1572, update the version number to 9.1.1. * Per #1572, list the met-9.1.1 release date as 20201119 for the docs. * Per #1572, add release notes for the met-9.1.1 verison. * Per #1572, add release notes for met-9.1.1 version. * Per #1572, let's try to release met-9.1.1 today 11/18. * Correct GitHub link. * Fix small typo in release notes. * Bugfix 1618 main v91 pb2nc (#1622) Co-authored-by: Howard Soh <[email protected]> * Update pull_request_template.md * Per #1638, apply the same 3 fixes to the main_v9.1 branch to be included in the next bugfix release for met-9.1. (#1640) * Per #1646, one line fix for cut-and-paste error. (#1648) * Adding necessary files for ReadTheDocs (#1703) * Per #1706, update aggr_ecnt_lines() to also add an entry to the skip_ba array of booleans. This is used to keep track of whether or not the point should be excluded from the statistics and is used in the Ensemble-Stat. This same aggregation code in the vx_statistics library is used by Stat-Analysis. Since Stat-Analysis failed to populate that array, it led to an array parsing error in ECNTInfo::set() when trying to compute the sum of the weights. (#1707) * Per #1694, exactly the same set of bugfix changes but applied to main_v9.1 rather than the develop branch. (#1709) * Feature 1710 v9.1.2 (#1711) * Per #1710, update the release notes and version numbers for the 9.1.2 release. * Per #1710, add a release note about the move to read the docs. Co-authored-by: hsoh-u <[email protected]> Co-authored-by: Howard Soh <[email protected]> Co-authored-by: jprestop <[email protected]>
* Add debug level 4 message to list out the number of GRIB2 records inventoried. This helps debugging issues with MET potentially not reading all input GRIB2 records on WCOSS. * Bugfix 1554 main_v9.1 ncdump (#1555) * Bugfix 1562 main_v9.1 grid_diag (#1563) * Per #1562, check for bad data values before adding data to the PDF's for grid_diag. * Per #1562, removing the poly = CONUS.poly mask from GridDiagConfig_TMP. That settting masked a problem in the handling of missing data. Exercising the mask.poly option is tested in another unit test. This will change the output and break the nightly build, but that's good since we'll do more thorough testing. * Per #1508, change the verbosity in unit_tc_gen.xml from -v 2 to -v 5 to print out some additional log messages that may help in debugging the intermittent file list failure. * Feature 1572 v9.1.1 (#1573) * Per #1572, delete the docs/version file as it is not needed here. * Per #1572, update the version number to 9.1.1. * Per #1572, list the met-9.1.1 release date as 20201119 for the docs. * Per #1572, add release notes for the met-9.1.1 verison. * Per #1572, add release notes for met-9.1.1 version. * Per #1572, let's try to release met-9.1.1 today 11/18. * Correct GitHub link. * Fix small typo in release notes. * Bugfix 1618 main v91 pb2nc (#1622) Co-authored-by: Howard Soh <[email protected]> * Update pull_request_template.md * Per #1638, apply the same 3 fixes to the main_v9.1 branch to be included in the next bugfix release for met-9.1. (#1640) * Per #1646, one line fix for cut-and-paste error. (#1648) * Adding necessary files for ReadTheDocs (#1703) * Per #1706, update aggr_ecnt_lines() to also add an entry to the skip_ba array of booleans. This is used to keep track of whether or not the point should be excluded from the statistics and is used in the Ensemble-Stat. This same aggregation code in the vx_statistics library is used by Stat-Analysis. Since Stat-Analysis failed to populate that array, it led to an array parsing error in ECNTInfo::set() when trying to compute the sum of the weights. (#1707) * Per #1694, exactly the same set of bugfix changes but applied to main_v9.1 rather than the develop branch. (#1709) * Feature 1710 v9.1.2 (#1711) * Per #1710, update the release notes and version numbers for the 9.1.2 release. * Per #1710, add a release note about the move to read the docs. * #1715 Do not combined if there are no overlapping beteewn TQZ and UV records * #1715 Do not combined if there are no overlapping beteewn TQZ and UV records * #1715 Do not combined if there are no overlapping beteewn TQZ and UV records * #1715 Added blank line for Warning * #1715 Added a blank line for Error * Bugfix 1716 main_v9.1 perc_thresh (#1721) * Per #1716, committing the same bugfix to the main_v9.1 branch for inclusion in the 9.1.3 bugfix release. * Per #1716, change SFP50 example to SFP33.3 to show an example of using floating point percentile values. * Per #1723, update the verison number to 9.1.3. (#1730) * Update pull_request_template.md Co-authored-by: hsoh-u <[email protected]> Co-authored-by: Howard Soh <[email protected]> Co-authored-by: jprestop <[email protected]>
Describe the Problem
When testing the verification of the climatology-based probabilities for NOAA/CPC with Grid-Stat, John Opatz discovered an apparent bug in MET's computation of the climatological cumulative distribution function values.
While the issue was identified using CPC data, it is also apparent in the existing unit test output:
climatology/grid_stat_PROB_GFS_CLIMO_1.0DEG_150000L_20120409_120000V_pairs.nc
I also found a bug in derive_climo_prob(). When the observation threshold is defined relative to climatology (e.g. >CDP67), the climatological probability value should be constant. However, the constant value should depend on the inequality type, where <CDP{N} has climo probability p=N/100 while >CDP{N} has probability 1.0-p.
This task is to fix both of these bugs.
Expected Behavior
The climo CDF value is computed as the area (between 0 and 1) of the climo distribution to the left of the observation value. However, MET is not computing that correctly.
Environment
Describe your runtime environment:
1. Machine: NB output on kiowa.
To Reproduce
Describe the steps to reproduce the behavior:
1. Inspect the file: climatology/grid_stat_PROB_GFS_CLIMO_1.0DEG_150000L_20120409_120000V_pairs.nc
2. Using ncview, note the values for i=8, j=77:
obs = 13.1514, climo_mean = 3.21593, climo_stdev = 7.9448, climo_cdf = 0.359584
3. Note that the obs > climo_mean but climo_cdf < 0.5. When the obs value greater than the climo mean, the CDF should be greater than 0.5.
So there's a bug in this computation!
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
2788881
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
Bugfix Checklist
See the METplus Workflow for details.
Branch name:
bugfix_<Issue Number>_main_<Version>_<Description>
Pull request:
bugfix <Issue Number> main_<Version> <Description>
Select: Reviewer(s), Project(s), Milestone, and Linked issues
Branch name:
bugfix_<Issue Number>_develop_<Description>
Pull request:
bugfix <Issue Number> develop <Description>
The text was updated successfully, but these errors were encountered: