Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per dataset DRS in native6 project #970

Closed
wants to merge 1 commit into from
Closed

Conversation

Peter9192
Copy link
Contributor

@Peter9192 Peter9192 commented Jan 29, 2021

Description

This extends the data finder for the native6 project. This makes it possible to have e.g.

config-user.yml:

rootpath:
  native6: ~/data
drs:
  native6: default

config-developer.yml:

native6:
  input_dir:
    default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
    MSWEP: '{project}/{dataset}_{version}/{area}_{frequency}_{grid}'
    EMAC: '...'
  input_file:
    default: '*.nc'
    MSWEP: '*.nc'
    EMAC: '*.nc'

The data finder will then try to use the dataset name to find the corresponding entry in the config-developer file.
So, if in a recipe

datasets:
  - {project: native6, dataset: MSWEP, version: V220, ...}
  - {project: native6, dataset: ERA5, version: 1, tier: 3, ...}
  - {project: native6, dataset: EMAC, ...}

it will find the corresponding data stored in:
MSWEP: ~/data/native6/MSWEP_V220/...
ERA5: ~/data/native6/Tier3/ERA5/1/... (it falls back to default for there is no ERA5 specific entry in cfg-developer)
EMAC: ~/data/native6/EMAC/...

Before I continue, @bouweandela is this what you had in mind for #494?
@stefsmeets @bsolino relevant for MSWEP and EMAC.

TODO: Currently it only works for the input_dir, not the filename pattern


Before you get started

Checklist

If you make backwards incompatible changes to the recipe format:

  • Update ESMValTool and link the pull request(s) in the description

To help with the number pull requests:

@bouweandela
Copy link
Member

Before I continue, @bouweandela is this what you had in mind for #494?

It should work for any project, not just native6. We try to have as little project specific code, to keep it possible to use the tool with your own projects and cmor tables. As you know from EUCP, many people have data that is close, but does not quite follow the CMIP data requests.

@bsolino
Copy link
Contributor

bsolino commented Apr 23, 2021

I was thinking now of this issue and I'm unsure if that's the right structure for the config_developer.yml file. Currently those keys are supposed to represent different file structures for different centers key machines, and I feel like those concepts shouldn't be mixed.

Perhaps something like this could be possible?

native6:
  default:
    input_dir:
      default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
      BSC: '{project}/{dataset}_{version}/{area}_{frequency}_{grid}'
      RCAST: '...'
    input_file:
      default: '*.nc'
      BSC: '*.nc'
      RCAST: '*.nc'
  ICON:
    input_dir:
      default: '{model_version}_{model_component}_{experiment}_{grid}_{id}'
    input_file:
      default: '{model_version}_{model_component}_{experiment}_{grid}_{id}_{var_type}*.nc'
  EMAC:
    ...

I'm not sure what the best ordering is, if datasets first and centers key machines on the inner level (as depicted) or viceversa.

@bsolino bsolino mentioned this pull request Apr 23, 2021
@bouweandela
Copy link
Member

I like the suggestion by @bsolino

@zklaus
Copy link

zklaus commented May 17, 2021

This looks promising. @Peter9192, are you still working on this? Do you think it might find its way into 2.3.0?

@Peter9192
Copy link
Contributor Author

No I haven't looked at this recently. My original idea was quite simple, not sure if that still works with the new requirements discussed since. I guess I better close this. Perhaps @bsolino wants to have a go at implementing his suggestion?

@Peter9192 Peter9192 closed this May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support per dataset DRS
4 participants