Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate multi year means for annual-only climdex indices #73

Closed
corviday opened this issue May 17, 2019 · 8 comments
Closed

generate multi year means for annual-only climdex indices #73

corviday opened this issue May 17, 2019 · 8 comments
Assignees

Comments

@corviday
Copy link
Contributor

There are 480 climdex datasets that are annual-only non-climatology datasets in active use by Climate Explorer.

Historically, we did not support annual-only climatologies, but we do now, and our analysis tools are much nicer for climatologies than non-climatologies, so it makes sense to generate climatologies and replace the non-MYM datasets with them in the database.

SELECT DISTINCT 
  data_files.unique_id
FROM 
  ce_meta.ensemble_data_file_variables, 
  ce_meta.ensembles, 
  ce_meta.data_files, 
  ce_meta.time_sets, 
  ce_meta.data_file_variables
WHERE 
  ensemble_data_file_variables.ensemble_id = ensembles.ensemble_id AND
  ensemble_data_file_variables.data_file_variable_id = data_file_variables.data_file_variable_id AND
  data_files.time_set_id = time_sets.time_set_id AND
  data_file_variables.data_file_id = data_files.data_file_id AND
  ensembles.ensemble_id > 13 AND 
  time_sets.multi_year_mean = FALSE;
@corviday
Copy link
Contributor Author

corviday commented May 23, 2019

The following variables are currently displayed as nominal annual data in PCEX, and could be straightforwardly replaced with annual-resolution climatologies:

  • altcddETCCDI
  • altcsdiETCCDI
  • altcwdETCCDI
  • altwsdiETCCDI
  • cddETCCDI
  • csdiETCCDI
  • cwdETCCDI
  • gslETCCDI
  • idETCCDI
  • r1mmETCCDI
  • suETCCDI
  • trETCCDI
  • wsdiETCCDI

These variables have more complicated situations and are already present in the database in both climatological and nonclimatological forms, so they'd require special handling:

  • dtrETCCDI
  • prcptot
  • r10mmETCCDI
  • r20mmETCCDI
  • r95pETCCDI
  • rp99ETCCDI
  • rx1dayETCCDI
  • rx5dayETCCDI
  • sdiiETCCDI
  • tn10pETCCDI
  • tn90pETCCDI
  • tnnETCCDI
  • tnxETCCDI
  • tx10pETCCDI
  • txnETCCDI
  • txxETCCDI

Those need more investigation, but I think I recall:

  1. we had a very small number of climos for the precipitation extremes, which are used in the extreme precip portal. That seems solvable.
  2. there are some climdex indices that are generated in both annual and monthly forms, and not strictly analogous. That is more thorny.

@corviday
Copy link
Contributor Author

corviday commented Jul 9, 2019

272 non-MYM datasets, corresponding to 16 variables, remain in the ce_files ensemble. They're all affected by data issues.

11 of those are affected by pacificclimate/climate-explorer-frontend#224 - the data was generated separately in monthly and annual forms that cannot be trivially reconciled and will take some discussion with the scientists to resolve.
Variables in this class: dtr, rx1day, rx5day, tn10p, tn90p, tnn, tnx, tx10p, tx90p, txn, txx

The remaining 5 variables have the mild complication of having had some climatologies generated already for use in the extreme precipitation portal, which makes them subject to #60 , as frequency naming standards have changed since they were generated. The rest of the climatologies can be generated, but we should avoid duplicate climatologies as described in that issue, by generating only the climatologies from the models not represented yet.
Variables in this class: cwd, r10mm, r20mm, r95p, r99p, sdii

@corviday
Copy link
Contributor Author

corviday commented Aug 12, 2019

I've finished generating and adding the datasets affected by the change in climatology naming conventions.

176 non-MYM datasets representing 11 variables remain in the ce_files ensemble. All of them are maximums or minimums of something; most exist as separate "monthly maximum" and "annual maximum" variables with different descriptions.

In order to generate climatologies for these data sets, we need to update the generate_climos script to take the maximum/minimum within a year and the mean between years (#50), and probably edit the variable long name on the resulting data sets.

@corviday
Copy link
Contributor Author

Looks like dtr, tn10p, tn90p, and tx10p were affected by the md5 hash bug, pacificclimate/modelmeta#65 . This bug has both a known workaround (https://github.com/pacificclimate/data-prep-actions/pull/1/files) and an actual solution - newer versions of get_climos produce files without this issue, though we did not set out to solve it specifically.

dtr appears to have had the workaround applied; the resulting files just were not associated; I've fixed that. Not sure whether to regenerate the climatologies or just apply the workaround to the other four variables. They're the last missing climatologies.

@corviday
Copy link
Contributor Author

corviday commented Sep 5, 2019

Remaining variables: tn10p, tn90p, tx10p, tx90p

Two questions:

  1. Can there variables be aggregated? Doesn't seem like it:

tn10p

  1. Why are these variables sometimes given in days, and sometimes in percentages? Which is correct? Does a "full set" of whichever unit is correct exist?

I've emailed Trevor to set up an appointment and ask.

@corviday corviday self-assigned this Sep 5, 2019
@corviday
Copy link
Contributor Author

After meeting with Trevor:

  • Percentage is the correct unit
  • A "full set" of percentage data exists
  • These variables can be aggregated normally, ie, calculating seasonal values by averaging the three months in that season.

@corviday
Copy link
Contributor Author

corviday commented Sep 13, 2019

  • tx90p

@corviday
Copy link
Contributor Author

SELECT COUNT(*) 
FROM 
  ce_meta.data_file_variables, 
  ce_meta.data_files, 
  ce_meta.ensemble_data_file_variables, 
  ce_meta.ensembles, 
  ce_meta.time_sets
WHERE 
  data_file_variables.data_file_id = data_files.data_file_id AND
  data_files.time_set_id = time_sets.time_set_id AND
  ensemble_data_file_variables.data_file_variable_id = data_file_variables.data_file_variable_id AND
  ensembles.ensemble_id = ensemble_data_file_variables.ensemble_id AND
  time_sets.multi_year_mean = False AND 
  ensembles.ensemble_name = 'ce_files';

=> 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant