Skip to content

Commit

Permalink
Feature 1684 bss and 1685 single reference model (#1689)
Browse files Browse the repository at this point in the history
* Per #1684, move an instance of the ClimoCDFInfo class into PairBase. Also define derive_climo_vals() and derive_climo_prob() utility functions.

* Add to VxPairDataPoint and VxPairDataEnsemble functions to set the ClimoCDFInfo class.

* Per #1684, update ensemble_stat and point_stat to set the ClimoCDFInfo object based on the contents of the config file.

* Per #1684, update the vx_statistics library and stat_analysis to make calls to the new derive_climo_vals() and derive_climo_prob() functions.

* Per #1684, since cdf_info is a member of PairBase class, need to handle it in the PairDataPoint and PairDataEnsemble assignment and subsetting logic.

* Per #1684, during development, I ran across and then updated this log message.

* Per #1684, working on log messages and figured that the regridding climo data should be moved from Debug(1) to at least Debug(2).

* Per #1684 and #1685, update the logic for the derive_climo_vals() utility function. If only a single climo bin is requested, just return the climo mean. Otherwise, sample the requested number of values.

* Per #1684, just fixing the format of this log message.

* Per #1684, add a STATLine::get_offset() member function.

* Per #1684, update parse_orank_line() logic. Rather than calling NumArray::clear() call NumArray::erase() to preserve allocated memory. Also, instead of parsing ensemble member values by column name, parse them by offset number.

* Per #1684, call EnsemblePairData::extend() when parsing ORANK data to allocate one block of memory instead of bunches of litte ones.

* Per #1684 and #1685, add another call to Ensemble-Stat to test computing the CRPSCL_EMP from a single climo mean instead of using the full climo distribution.

* Per #1684 and #1685, update ensemble-stat docs about computing CRPSS_EMP relative to a single reference model.

* Per #1684, need to update Grid-Stat to store the climo cdf info in the PairDataPoint objects.

* Per #1684, remove debug print statements.

* Per #1684, need to set cdf_info when aggregating MPR lines in Stat-Analysis.

* Per #1684 and #1685, update PairDataEnsemble::compute_pair_vals() to print a log message indicating the climo data being used as reference:

For a climo distribution defined by mean and stdev:
DEBUG 3: Computing ensemble statistics relative to a 9-member climatological ensemble.

For a single deterministic reference:
DEBUG 3: Computing ensemble statistics relative to the climatological mean.
  • Loading branch information
JohnHalleyGotway authored Mar 2, 2021
1 parent 0f5366c commit 996197c
Show file tree
Hide file tree
Showing 23 changed files with 550 additions and 143 deletions.
6 changes: 5 additions & 1 deletion met/docs/Users_Guide/ensemble-stat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ The ranked probability score (RPS) is included in the Ranked Probability Score (
Climatology data
~~~~~~~~~~~~~~~~

The Ensemble-Stat output includes at least three statistics computed relative to external climatology data. The climatology is defined by mean and standard deviation fields, and both are required in the computation of ensemble skill score statistics. MET assumes that the climatology follows a normal distribution, defined by the mean and standard deviation at each point. When computing the CRPS skill score for (:ref:`Gneiting et al., 2004 <Gneiting-2004>`) the reference CRPS statistic is computed using the climatological mean and standard deviation directly. When computing the CRPS skill score for (:ref:`Hersbach, 2000 <Hersbach-2000>`) the reference CRPS statistic is computed by selecting equal-area-spaced values from the assumed normal climatological distribution. The number of points selected is determined by the *cdf_bins* setting in the *climo_cdf* dictionary. The reference CRPS is computed empirically from this ensemble of climatology values. The climatological distribution is also used for the RPSS. The forecast RPS statistic is computed from a probabilistic contingency table in which the probabilities are derived from the ensemble member values. In a simliar fashion, the climatogical probability for each observed value is derived from the climatological distribution. The area of the distribution to the left of the observed value is interpreted as the climatological probability. These climatological probabilities are also evaluated using a probabilistic contingency table from which the reference RPS score is computed. The skill scores are derived by comparing the forecast statistic to the reference climatology statistic.
The Ensemble-Stat output includes at least three statistics computed relative to external climatology data. The climatology is defined by mean and standard deviation fields, and typically both are required in the computation of ensemble skill score statistics. MET assumes that the climatology follows a normal distribution, defined by the mean and standard deviation at each point.

When computing the CRPS skill score for (:ref:`Gneiting et al., 2004 <Gneiting-2004>`) the reference CRPS statistic is computed using the climatological mean and standard deviation directly. When computing the CRPS skill score for (:ref:`Hersbach, 2000 <Hersbach-2000>`) the reference CRPS statistic is computed by selecting equal-area-spaced values from the assumed normal climatological distribution. The number of points selected is determined by the *cdf_bins* setting in the *climo_cdf* dictionary. The reference CRPS is computed empirically from this ensemble of climatology values. If the number bins is set to 1, the climatological CRPS is computed using only the climatological mean value. In this way, the empirical CRPSS may be computed relative to a single model rather than a climatological distribution.

The climatological distribution is also used for the RPSS. The forecast RPS statistic is computed from a probabilistic contingency table in which the probabilities are derived from the ensemble member values. In a simliar fashion, the climatogical probability for each observed value is derived from the climatological distribution. The area of the distribution to the left of the observed value is interpreted as the climatological probability. These climatological probabilities are also evaluated using a probabilistic contingency table from which the reference RPS score is computed. The skill scores are derived by comparing the forecast statistic to the reference climatology statistic.

Ensemble observation error
~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
16 changes: 14 additions & 2 deletions met/src/libcode/vx_analysis_util/stat_line.cc
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,18 @@ bool STATLine::has(const char *col_str) const

{

return ( !is_bad_data(get_offset(col_str)) );

}


////////////////////////////////////////////////////////////////////////


int STATLine::get_offset(const char *col_str) const

{

int offset = bad_data_int;
int dim = bad_data_int;

Expand All @@ -353,10 +365,10 @@ if ( is_bad_data(offset) ) {
}

//
// Return whether a valid offset value was found
// Return the offset value
//

return ( !is_bad_data(offset) );
return ( offset );

}

Expand Down
9 changes: 5 additions & 4 deletions met/src/libcode/vx_analysis_util/stat_line.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ class STATLine : public DataLine {
// retrieve values of the header columns
//

bool has (const char *) const;
ConcatString get (const char *, bool check_na = true) const;
const char * get_item (const char *, bool check_na = true) const;
const char * get_item (int, bool check_na = true) const;
bool has (const char *) const;
int get_offset(const char *) const;
ConcatString get (const char *, bool check_na = true) const;
const char * get_item (const char *, bool check_na = true) const;
const char * get_item (int, bool check_na = true) const;

const char * version () const;
const char * model () const;
Expand Down
3 changes: 2 additions & 1 deletion met/src/libcode/vx_statistics/compute_stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -743,7 +743,8 @@ void compute_pctinfo(const PairDataPoint &pd, bool pstd_flag,
// Use input climatological probabilities or derive them
if(cmn_flag) {
if(cprob_in) climo_prob = *cprob_in;
else climo_prob = derive_climo_prob(pd.cmn_na, pd.csd_na,
else climo_prob = derive_climo_prob(pd.cdf_info,
pd.cmn_na, pd.csd_na,
pct_info.othresh);
}

Expand Down
11 changes: 6 additions & 5 deletions met/src/libcode/vx_statistics/ens_stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -484,10 +484,10 @@ void RPSInfo::set(const PairDataEnsemble &pd) {
// Check that thresholds are actually defined
if(fthresh.n() == 0) {
mlog << Error << "\nRPSInfo::set(const PairDataEnsemble &) -> "
<< "no thresholds provided to compute the RPS line type! "
<< "Specify thresholds using the \""
<< conf_key_prob_cat_thresh
<< "\" configuration file option.\n\n";
<< "no thresholds provided to compute the RPS line type!\n"
<< "Specify thresholds using the \"" << conf_key_prob_cat_thresh
<< "\" configuration file option or by providing climatological "
<< "mean and standard deviation data.\n\n";
exit(1);
}

Expand Down Expand Up @@ -522,7 +522,8 @@ void RPSInfo::set(const PairDataEnsemble &pd) {
climo_pct.zero_out();

// Derive climatological probabilities
if(cmn_flag) climo_prob = derive_climo_prob(pd.cmn_na, pd.csd_na,
if(cmn_flag) climo_prob = derive_climo_prob(pd.cdf_info,
pd.cmn_na, pd.csd_na,
fthresh[i]);

// Loop over the observations
Expand Down
100 changes: 82 additions & 18 deletions met/src/libcode/vx_statistics/pair_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ void PairBase::clear() {
interp_mthd = InterpMthd_None;
interp_shape = GridTemplateFactory::GridTemplate_None;

cdf_info.clear();

o_na.clear();
x_na.clear();
y_na.clear();
Expand Down Expand Up @@ -121,6 +123,8 @@ void PairBase::erase() {
interp_mthd = InterpMthd_None;
interp_shape = GridTemplateFactory::GridTemplate_None;

cdf_info.clear();

o_na.erase();
x_na.erase();
y_na.erase();
Expand Down Expand Up @@ -267,6 +271,15 @@ void PairBase::set_interp_shape(GridTemplateFactory::GridTemplates shape) {

////////////////////////////////////////////////////////////////////////

void PairBase::set_climo_cdf_info(const ClimoCDFInfo &info) {

cdf_info = info;

return;
}

////////////////////////////////////////////////////////////////////////

void PairBase::set_fcst_ut(unixtime ut){

fcst_ut = ut;
Expand Down Expand Up @@ -1010,11 +1023,49 @@ bool set_climo_flag(const NumArray &f_na, const NumArray &c_na) {

////////////////////////////////////////////////////////////////////////

NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
void derive_climo_vals(const ClimoCDFInfo &cdf_info,
double m, double s,
NumArray &climo_vals) {

// Initialize
climo_vals.erase();

// cdf_info.cdf_ta starts with >=0.0 and ends with >=1.0.
// The number of bins is the number of thresholds minus 1.

// Check for bad mean value
if(is_bad_data(m) || cdf_info.cdf_ta.n() < 2) {
return;
}
// Single climo bin
else if(cdf_info.cdf_ta.n() == 2) {
climo_vals.add(m);
}
// Check for bad standard deviation value
else if(is_bad_data(s)) {
return;
}
// Extract climo distribution values
else {

// Skip the first and last thresholds
for(int i=1; i<cdf_info.cdf_ta.n()-1; i++) {
climo_vals.add(
normal_cdf_inv(cdf_info.cdf_ta[i].get_value(), m, s));
}
}

return;
}

////////////////////////////////////////////////////////////////////////

NumArray derive_climo_prob(const ClimoCDFInfo &cdf_info,
const NumArray &mn_na, const NumArray &sd_na,
const SingleThresh &othresh) {
int i, n_mn, n_sd;
double prob;
NumArray climo_prob;
NumArray climo_prob, climo_vals;

// Number of valid climo mean and standard deviation
n_mn = mn_na.n_valid();
Expand Down Expand Up @@ -1045,7 +1096,7 @@ NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
break;

default:
mlog << Error << "\n\nderive_climo_prob() -> "
mlog << Error << "\nderive_climo_prob() -> "
<< "climatological threshold \"" << othresh.get_str()
<< "\" cannot be converted to a probability!\n\n";
exit(1);
Expand All @@ -1060,23 +1111,17 @@ NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
// threshold
else if(n_mn > 0 && n_sd > 0) {

mlog << Debug(2)
<< "Deriving normal approximation of climatological "
<< "probabilities for threshold " << othresh.get_str()
<< ".\n";
// The first (>=0.0) and last (>=1.0) climo thresholds are omitted
mlog << Debug(4)
<< "Deriving climatological probabilities for threshold "
<< othresh.get_str() << " by sampling " << cdf_info.cdf_ta.n()-2
<< " values from the normal climatological distribution.\n";

// Compute probability value for each point
// Compute the probability by sampling from the climo distribution
// and deriving the event frequency
for(i=0; i<mn_na.n(); i++) {

prob = normal_cdf(othresh.get_value(), mn_na[i], sd_na[i]);

// Handle greater-than probabilities
if(!is_bad_data(prob) &&
(othresh.get_type() == thresh_gt ||
othresh.get_type() == thresh_ge)) {
prob = 1.0 - prob;
}
climo_prob.add(prob);
derive_climo_vals(cdf_info, mn_na[i], sd_na[i], climo_vals);
climo_prob.add(derive_prob(climo_vals, othresh));
}
}
// If only climatological mean was provided, it should already
Expand Down Expand Up @@ -1107,3 +1152,22 @@ NumArray derive_climo_prob(const NumArray &mn_na, const NumArray &sd_na,
}

////////////////////////////////////////////////////////////////////////

double derive_prob(const NumArray &na, const SingleThresh &st) {
int i, n_vld, n_event;
double prob;

// Count up the number of events
for(i=0,n_vld=0,n_event=0; i<na.n(); i++) {
if(is_bad_data(na[i])) continue;
n_vld++;
if(st.check(na[i])) n_event++;
}

if(n_vld == 0) prob = bad_data_double;
else prob = (double) n_event / n_vld;

return(prob);
}

////////////////////////////////////////////////////////////////////////
48 changes: 29 additions & 19 deletions met/src/libcode/vx_statistics/pair_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ class PairBase {
int interp_wdth;
GridTemplateFactory::GridTemplates interp_shape;

// Climo distribution info
ClimoCDFInfo cdf_info;

// Point and Grid Observation Information
NumArray o_na; // Observation value [n_obs]
NumArray x_na; // X [n_obs]
Expand Down Expand Up @@ -131,6 +134,8 @@ class PairBase {
void set_interp_wdth(int);
void set_interp_shape(GridTemplateFactory::GridTemplates);

void set_climo_cdf_info(const ClimoCDFInfo &);

void set_fcst_ut(unixtime ut);
void set_check_unique(bool check);
void set_obs_summary(ObsSummary s);
Expand Down Expand Up @@ -159,8 +164,7 @@ class PairBase {

void add_grid_obs(double, double, double,
double, double, double);



void add_climo(double, double, double);
void set_climo(int, double, double, double);
void add_climo_cdf();
Expand All @@ -187,29 +191,35 @@ extern void find_vert_lvl(const DataPlaneArray &, const double,
int &, int &);

extern double compute_interp(const DataPlaneArray &dpa,
const double obs_x, const double obs_y,
const double obs_v, const double cmn, const double csd,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
const SingleThresh *cat_thresh = 0);
const double obs_x, const double obs_y,
const double obs_v, const double cmn, const double csd,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
const SingleThresh *cat_thresh = 0);

extern void get_interp_points(const DataPlaneArray &dpa,
const double obs_x, const double obs_y,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
NumArray &interp_points);
const double obs_x, const double obs_y,
const InterpMthd method, const int width,
const GridTemplateFactory::GridTemplates shape,
const double thresh,
const bool spfh_flag, const LevelType lvl_typ,
const double to_lvl, const int i_blw, const int i_abv,
NumArray &interp_points);

extern bool set_climo_flag(const NumArray &, const NumArray &);
extern bool set_climo_flag(const NumArray &, const NumArray &);

extern NumArray derive_climo_prob(const NumArray &, const NumArray &,
extern void derive_climo_vals(const ClimoCDFInfo &,
double, double, NumArray &);

extern NumArray derive_climo_prob(const ClimoCDFInfo &,
const NumArray &, const NumArray &,
const SingleThresh &);

extern double derive_prob(const NumArray &, const SingleThresh &);

////////////////////////////////////////////////////////////////////////

#endif // __PAIR_BASE_H__
Expand Down
Loading

0 comments on commit 996197c

Please sign in to comment.