-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for different dataset versions #3452
Comments
I thought about this as well while updating an existing cmorizer and agree that supporting different versions of datasets would be a nice enhancement for the ESMValTool. Due to version parameter in filenames and recipes, it's already possible for diagnostics to support multiple versions. In #3430 I suggested to add versioning to downloaders and formatters via an However, I'm not sure if we should keep all references and details in the documentation up to date for every version? |
On ESGF, for example in the obs4MIPs project, multiple versions are supported by making the version part of the dataset name. For example, version 1.0, 2.0, and 2.1 of the AIRS dataset are represented as:
and the actual See here for an overview of the available dataset names and versions. Since we are hoping that the CMORized data can be published on ESGF at some point, we may want to align how we call our datasets with this practice. It would be good to check with the obs4MIPs / CREATE-IP folks if they are happy with this way of versioning data? Any input on this @gleckler1 @glpotter? I would highly recommend that we sanitize the dataset and source version names for unwanted characters though (as opposed to how its done in some cases now). We should only allow upper and lower case characters A to Z, numbers 0 to 9, and a |
@bouweandela thanks for checking in. It makes sense to me that you want to improve how dataset versions are defined for ESMValTool. Following CMIP, obs4MIPs constructs a source_id as below (please see draft ODS2.5 document due to be finalized at the end of this month). It would be great for the community if over time ESMValTool and obs4MIPs datasets could be further aligned. source_id = <source_label>-<source_version_number> but substituting "-" for certain forbidden characters (including ".", “_”, “(“, “)”, “/”, and " ") |
After several discussions with @schlunma and @bouweandela, I propose the following strategy to add support for different dataset versions to our downloading and formatting scripts for observational datasets. If there are no strong opinions or vetoes, we could try to start implementing this soon. From my point of view, an important aim is to remain as backward compatible as possible.
|
With new versions of observational datasets becoming available, new science studies might want to use the latest versions. In order to be able to reproduce published work, however, or to compare different versions of a dataset, also older versions are of interest.
This issue is meant to collect ideas on how to add support for different versions of a dataset to the CMORizers (config file, downloader, formatter) and to come up with a stragegy for technical implementation.
@ESMValGroup/esmvaltool-coreteam please add your thoughts here.
The text was updated successfully, but these errors were encountered: