Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow data with extra dimension to pass the CMOR checks #871

Open
sloosvel opened this issue Nov 20, 2020 · 6 comments · May be fixed by #872
Open

Allow data with extra dimension to pass the CMOR checks #871

sloosvel opened this issue Nov 20, 2020 · 6 comments · May be fixed by #872
Assignees
Labels
cmor Related to the CMOR standard enhancement New feature or request

Comments

@sloosvel
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Some seasonal data is stored with an extra ensemble dimension that the only issue that gives is the fact that it does not pass the CMOR check on the rank. Discussed with @jvegasbsc the possibility to add a parameter called extra_dimensions for projects in the config developer. If set to true, the check on the rank will be skipped, with a debug message informing of this.

Would you be able to help out?
Yes

@sloosvel sloosvel added enhancement New feature or request cmor Related to the CMOR standard labels Nov 20, 2020
@sloosvel sloosvel linked a pull request Nov 20, 2020 that will close this issue
9 tasks
@bouweandela
Copy link
Member

Do you really need the extra dimensions? Wouldn't it be simpler to change the CMORizer scripts so they remove any extra dimensions that are not compliant?

@jvegreg
Copy link
Contributor

jvegreg commented Dec 22, 2020

Do you really need the extra dimensions? Wouldn't it be simpler to change the CMORizer scripts so they remove any extra dimensions that are not compliant?

Yes. The example we are working on is based on seasonal predictions, in which usually have a extra dimension for the ensemble instead of them being treated as a single dataset like in CMIP data. It will be very interesting to keep them and add a couple preprocessor that can compute ensemble means, medians, percentiles and such.

@bouweandela
Copy link
Member

Yes. The example we are working on is based on seasonal predictions, in which usually have a extra dimension for the ensemble instead of them being treated as a single dataset like in CMIP data. It

Is this data available publicly? Wouldn't it make more sense to use a CMOR table that actually describes the data correctly, rather than adding extra code and settings to work around the data not matching the CMIP6 CMOR table?

It will be very interesting to keep them and add a couple preprocessor that can compute ensemble means, medians, percentiles and such.

I think it is important that preprocessor functions work with CMIP data, do you foresee to add some way of making that work too?

@jvegreg
Copy link
Contributor

jvegreg commented Jan 4, 2021

Is this data available publicly?

Kind off: full data is restricted (see https://www.ecmwf.int/en/forecasts/accessing-forecasts for example) but a subset is available trhough the CDS: https://cds.climate.copernicus.eu/cdsapp#!/dataset/seasonal-original-single-levels?tab=form

Wouldn't it make more sense to use a CMOR table that actually describes the data correctly, rather than adding extra code and settings to work around the data not matching the CMIP6 CMOR table?

As far as I know, this table does not exist, so I think we should treat this kind of datasets like we treat observations. At BSC we chose to make them CMOR-like but keeping them in a one-file per variable - startdate, so we need this extra dimension.

I think it is important that preprocessor functions work with CMIP data, do you foresee to add some way of making that work too?

Yes, @sloosvel is experimenting with a way to create this extra dimension by merging datasets that only differ in the ensemble key. This is really important for DCPP experiments (the decadal ones), but lots of other diagnostics can benefit from this. CMIP6 experiments usually have much more members than CMIP5, so having a way to compute ensemble mean, median and percentiles for each dataset to reduce the amount of data the diagnostic should manage is key.

@bouweandela
Copy link
Member

As far as I know, this table does not exist, so I think we should treat this kind of datasets like we treat observations. At BSC we chose to make them CMOR-like but keeping them in a one-file per variable - startdate, so we need this extra dimension.

How big is this dataset and do you have a CMORization script for this or is it like this when you download it? Because if you need to CMORize, you might as well split the files into one file per ensemble. Or do you have thousands of ensemble members?

@jvegreg
Copy link
Contributor

jvegreg commented Jan 4, 2021

How big is this dataset and do you have a CMORization script for this or is it like this when you download it? Because if you need to CMORize, you might as well split the files into one file per ensemble. Or do you have thousands of ensemble members?

Usually, seasonal datasets are around 25-50 members per startdate, with one startdate per month, with the usual runs being about 7 months long. The size is dependant on time resolution and it is not uncommon to work with hourly data.

Just to make my point clear, there is no need for this to be able to load the data: we can split this in the cmorization process or using fixes like in ERA5. This is merely convenient as members in seasonal forecast are never used alone and doing the split-merge process will usually be a waste of resources.

Anyway, I think it will be better to organize a telco with several of us so we can discuss the seasonal-decadal requirements and the best way to solve them, maybe shortly after the February release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmor Related to the CMOR standard enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants