Feature 1905 ens_ctrl (#1955)

Co-authored-by: j-opatz <[email protected]>
dtcenter · Nov 3, 2021 · d60924c · d60924c
1 parent 8c446ff
commit d60924c
Show file tree

Hide file tree

Showing 13 changed files with 394 additions and 273 deletions.
diff --git a/met/docs/Users_Guide/ensemble-stat.rst b/met/docs/Users_Guide/ensemble-stat.rst
@@ -69,6 +69,7 @@ The usage statement for the Ensemble Stat tool is shown below:
          [-grid_obs file]
          [-point_obs file]
          [-ens_mean file]
+         [-ctrl file]
          [-obs_valid_beg time]
          [-obs_valid_end time]
          [-outdir path]
@@ -92,23 +93,23 @@ Optional arguments for ensemble_stat
 
 4. To produce ensemble statistics using gridded observations, use the **-grid_obs file** option to specify a gridded observation file. This option may be used multiple times if your observations are in several files.
 
+5. To produce ensemble statistics using point observations, use the **-point_obs file** option to specify a NetCDF point observation file. This option may be used multiple times if your observations are in several files.
 
-5. To produce ensemble statistics using point observations, use the **-point_obs file** to specify a NetCDF point observation file. This option may be used multiple times if your observations are in several files.
+6. To override the simple ensemble mean value of the input ensemble members for the ECNT, SSVAR, and ORANK line types, the **-ens_mean file** option specifies an ensemble mean model data file. This option replaces the **-ssvar_mean file** option from earlier versions of MET.
 
+7. The **-ctrl file** option specifies an ensemble control member data file. The control member is included in the computation of the ensemble mean but excluded from the spread.
 
-6. To override the simple ensemble mean value of the input ensemble members for the ECNT, SSVAR, and ORANK line types, the **-ens_mean file** specifies an ensemble mean model data file. This option replaces the **-ssvar_mean file** from earlier versions of MET.
+8. To filter point observations by time, use **-obs_valid_beg time** in YYYYMMDD[_HH[MMSS]] format to set the beginning of the matching observation time window.
 
-7. To filter point observations by time, use **-obs_valid_beg time** in YYYYMMDD[_HH[MMSS]] format to set the beginning of the matching observation time window.
+9. As above, use **-obs_valid_end time** in YYYYMMDD[_HH[MMSS]] format to set the end of the matching observation time window.
 
-8. As above, use **-obs_valid_end time** in YYYYMMDD[_HH[MMSS]] format to set the end of the matching observation time window.
+10. Specify the **-outdir path** option to override the default output directory (./).
 
-9. Specify the **-outdir path** option to override the default output directory (./).
+11. The **-log** file outputs log messages to the specified file.
 
-10. The **-log** file outputs log messages to the specified file.
+12. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity will increase the amount of logging.
 
-11. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity will increase the amount of logging.
-
-12. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
+13. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
 
 An example of the ensemble_stat calling sequence is shown below:
 

diff --git a/met/docs/Users_Guide/reformat_grid.rst b/met/docs/Users_Guide/reformat_grid.rst
@@ -74,7 +74,7 @@ Required arguments for the pcp_combine
 Optional arguments for pcp_combine
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-3. The **-field string** option defines the data to be extracted from the input files. Use this option when processing fields other than **APCP** or non-GRIB files. This option may be used multiple times and output will be created for each.
+3. The **-field string** option defines the data to be extracted from the input files. Use this option when processing fields other than **APCP** or non-GRIB files. It can be used multiple times and output will be created for each. In general, the field string should include the **name** and **level** of the requested data and be enclosed in single quotes. It is processed as an inline configuration file and may also include data filtering, censoring, and conversion options. For example, use **-field ‘name=”ACPCP”; level=”A6”; convert(x)=x/25.4;’** to read 6-hourly accumulated convective precipitation from a GRIB file and convert from millimeters to inches. 
 
 4. The **-name list** option is a comma-separated list of output variable names which override the default choices. If specified, the number of names must match the number of variables to be written to the output file.
 

diff --git a/met/src/basic/vx_util/num_array.cc b/met/src/basic/vx_util/num_array.cc
@@ -1157,6 +1157,59 @@ double NumArray::wmean_fisher(const NumArray &wgt) const
 
 }
 
+
+////////////////////////////////////////////////////////////////////////
+
+
+double NumArray::variance(int skip_index) const
+
+{
+
+   if(n() == 0)  return ( bad_data_double );
+
+   int j, count;
+   double s, s_sq, var;
+
+   s = s_sq = 0.0;
+   count = 0;
+
+   for(j=0; j<n(); j++) {
+      if(is_bad_data(e[j]) || j == skip_index) continue;
+      s    += e[j];
+      s_sq += e[j]*e[j];
+      count++;
+   }
+
+   // Check for slightly negative precision error
+   if(count > 1) {
+      var = (s_sq - s*s/(double) count)/((double) (count - 1));
+      if(is_eq(var, 0.0)) var = 0.0;
+   }
+   else {
+      var = bad_data_double;
+   }
+
+   return(var);
+
+}
+
+
+////////////////////////////////////////////////////////////////////////
+
+
+double NumArray::stdev(int skip_index) const
+
+{
+
+   double v = variance(skip_index);
+
+   if ( !is_bad_data(v) )  v = sqrt(v);
+
+   return(v);
+
+}
+
+
 ////////////////////////////////////////////////////////////////////////
 
 //

diff --git a/met/src/basic/vx_util/num_array.h b/met/src/basic/vx_util/num_array.h
@@ -21,6 +21,7 @@
 #include <iostream>
 
 #include "concat_string.h"
+#include "is_bad_data.h"
 
 
 ////////////////////////////////////////////////////////////////////////
@@ -107,6 +108,9 @@ class NumArray {
       double mean_sqrt() const;
       double mean_fisher() const;
 
+      double variance(int skip_index = bad_data_int) const;
+      double stdev(int skip_index = bad_data_int) const;
+
       double wmean(const NumArray &) const;
       double wmean_sqrt(const NumArray &) const;
       double wmean_fisher(const NumArray &) const;

diff --git a/met/src/libcode/vx_statistics/pair_data_ensemble.cc b/met/src/libcode/vx_statistics/pair_data_ensemble.cc
@@ -108,6 +108,7 @@ void PairDataEnsemble::clear() {
 
    n_ens = 0;
    n_pair = 0;
+   ctrl_index = bad_data_int;
    skip_const = false;
    skip_ba.clear();
 
@@ -121,6 +122,7 @@ void PairDataEnsemble::clear() {
 
    esum_na.clear();
    esumsq_na.clear();
+   esumn_na.clear();
 
    mn_na.clear();
    mn_oerr_na.clear();
@@ -170,6 +172,7 @@ void PairDataEnsemble::extend(int n) {
    var_plus_oerr_na.extend   (n);
    esum_na.extend            (n);
    esumsq_na.extend          (n);
+   esumn_na.extend           (n);
    mn_na.extend              (n);
    mn_oerr_na.extend         (n);
 
@@ -222,6 +225,7 @@ void PairDataEnsemble::assign(const PairDataEnsemble &pd) {
    ign_na         = pd.ign_na;
    pit_na         = pd.pit_na;
    n_pair         = pd.n_pair;
+   ctrl_index     = pd.ctrl_index;
    skip_const     = pd.skip_const;
    skip_ba        = pd.skip_ba;
 
@@ -231,6 +235,7 @@ void PairDataEnsemble::assign(const PairDataEnsemble &pd) {
 
    esum_na        = pd.esum_na;
    esumsq_na      = pd.esumsq_na;
+   esumn_na       = pd.esumn_na;
 
    mn_na          = pd.mn_na;
    mn_oerr_na     = pd.mn_oerr_na;
@@ -290,12 +295,14 @@ void PairDataEnsemble::add_ens_var_sums(int i_obs, double v) {
    if(i_obs >= esum_na.n()) {
       esum_na.add(0.0);
       esumsq_na.add(0.0);
+      esumn_na.add(0.0);
    }
 
    // Track sums of the raw ensemble member values
    if(!is_bad_data(v)) {
       esum_na.inc(i_obs, v);
       esumsq_na.inc(i_obs, v*v);
+      esumn_na.inc(i_obs, 1);
    }
 
    return;
@@ -416,17 +423,17 @@ void PairDataEnsemble::compute_pair_vals(const gsl_rng *rng_ptr) {
       else {
 
          // Compute the variance of the unperturbed ensemble members
-         var_unperturbed = compute_variance(esum_na[i], esumsq_na[i], v_na[i]);
+         var_unperturbed = compute_variance(esum_na[i], esumsq_na[i], esumn_na[i]);
          var_na.add(var_unperturbed);
 
          // Process the observation error information.
          ObsErrorEntry * e = (has_obs_error() ? obs_error_entry[i] : 0);
          if(e) {
 
-            // Compute perturbed ensemble mean and variance 
-            cur_ens.compute_mean_variance(mean, var_perturbed);
-            mn_oerr_na.add(mean);
-            var_oerr_na.add(var_perturbed);
+            // Compute perturbed ensemble mean and variance
+            // Exclude the control member from the variance
+            mn_oerr_na.add(cur_ens.mean());
+            var_oerr_na.add(cur_ens.variance(ctrl_index));
 
             // Compute the variance plus observation error variance.
             var_plus_oerr_na.add(var_unperturbed +
@@ -467,8 +474,12 @@ void PairDataEnsemble::compute_pair_vals(const gsl_rng *rng_ptr) {
          crps_emp_na.add(compute_crps_emp(o_na[i], cur_ens));
          crpscl_emp_na.add(compute_crps_emp(o_na[i], cur_clm));
 
+         // Ensemble mean and standard deviation
+         // Exclude the control member from the standard deviation
+         mean  = cur_ens.mean();
+         stdev = cur_ens.stdev(ctrl_index);
+
          // Store Gaussian CRPS stats
-         cur_ens.compute_mean_stdev(mean, stdev);
          crps_gaus_na.add(compute_crps_gaus(o_na[i], mean, stdev));
          crpscl_gaus_na.add(compute_crps_gaus(o_na[i], cmn_na[i], csd_na[i]));
          ign_na.add(compute_ens_ign(o_na[i], mean, stdev));
@@ -613,7 +624,7 @@ struct ssvar_bin_comp {
 
 void PairDataEnsemble::compute_ssvar() {
    int i, j;
-   double mn, var;
+   double var;
    ssvar_bin_map bins;
    NumArray cur;
 
@@ -644,21 +655,22 @@ void PairDataEnsemble::compute_ssvar() {
       if(skip_ba[i]) continue;
 
       // Store ensemble values for the current observation
-      for(j=0, cur.erase(); j<n_ens; j++) cur.add(e_na[j][i]);
-
-      // Compute standard deviation
-      cur.compute_mean_variance(mn, var);
+      // Exclude the control member from the variance
+      for(j=0, cur.erase(); j<n_ens; j++) {
+         if(j == ctrl_index) continue;
+         cur.add(e_na[j][i]);
+      }
 
       // Build a variance point
       ens_ssvar_pt pt;
-      pt.var = var;
+      pt.var = cur.variance();
       pt.f   = mn_na[i];
       pt.o   = o_na[i];
       pt.w   = wgt_na[i];
 
       // Determine the bin for the current point and add it to the list
       // Bins are defined starting at 0 and are left-closed, right-open
-      j = floor(var/ssvar_bin_size);
+      j = floor(pt.var/ssvar_bin_size);
       string ssvar_min = str_format("%.5e", j*ssvar_bin_size).contents();
       if( !bins.count(ssvar_min) ){
          ssvar_pt_list pts;
@@ -801,7 +813,7 @@ PairDataEnsemble PairDataEnsemble::subset_pairs_obs_thresh(const SingleThresh &o
       //
       // Exclude from subset:
       //   sid_sa, lat_na, lon_na, x_na, y_na, vld_ta, lvl_ta, elv_ta,
-      //   o_qc_sa, esum_na, esumsq_na
+      //   o_qc_sa, esum_na, esumsq_na, esumn_na
 
       pd.wgt_na.add(wgt_na[i]);
       pd.o_na.add(o_na[i]);
@@ -1667,7 +1679,10 @@ void VxPairDataEnsemble::add_ens(int member, bool mn, Grid &gr) {
                else {
 
                   // Track unperturbed ensemble variance sums
-                  pd[i][j][k].add_ens_var_sums(l, fcst_v);
+                  // Exclude the control member from the variance
+                  if(member != pd[i][j][k].ctrl_index) {
+                     pd[i][j][k].add_ens_var_sums(l, fcst_v);
+                  }
 
                   // Apply observation error perturbation, if requested
                   if(obs_error_info->flag) {
@@ -1781,6 +1796,21 @@ void VxPairDataEnsemble::calc_obs_summary() {
 
 ////////////////////////////////////////////////////////////////////////
 
+void VxPairDataEnsemble::set_ctrl_index(int index) {
+
+   for(int i=0; i < n_msg_typ; i++){
+      for(int j=0; j < n_mask; j++){
+         for(int k=0; k < n_interp; k++){
+            pd[i][j][k].ctrl_index = index;
+         }
+      }
+   }
+
+   return;
+}
+
+////////////////////////////////////////////////////////////////////////
+
 void VxPairDataEnsemble::set_skip_const(bool tf) {
 
    for(int i=0; i < n_msg_typ; i++){

diff --git a/met/src/libcode/vx_statistics/pair_data_ensemble.h b/met/src/libcode/vx_statistics/pair_data_ensemble.h
@@ -90,6 +90,7 @@ class PairDataEnsemble : public PairBase {
 
       int        n_ens;           // Number of ensemble members
       int        n_pair;          // Number of valid pairs, n_obs - sum(skip_ba)
+      int        ctrl_index;      // Index of the control member
       bool       skip_const;      // Skip cases where the observation and
                                   // all ensemble members are constant
       BoolArray  skip_ba;         // Flag for each observation [n_obs]
@@ -106,6 +107,7 @@ class PairDataEnsemble : public PairBase {
 
       NumArray   esum_na;         // Sum of unperturbed ensemble values [n_obs]
       NumArray   esumsq_na;       // Sum of unperturbed ensemble squared values [n_obs]
+      NumArray   esumn_na;        // Count of ensemble values [n_obs]
 
       NumArray   mn_na;           // Ensemble mean value [n_obs]
       NumArray   mn_oerr_na;      // Mean of perturbed members [n_obs]
@@ -293,6 +295,8 @@ class VxPairDataEnsemble {
 
       void calc_obs_summary();
 
+      void set_ctrl_index(int);
+
       void set_skip_const(bool);
 };