Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe to create pool of obs4MIPs data #3317

Closed

Conversation

rbeucher
Copy link
Contributor

@rbeucher rbeucher commented Aug 9, 2023

Description

This is an attempt at a recipe leveraging the ESGF download capability to create an obs4MIPs data pool at NCI.
I am testing downloads from ESGF using corrected names.
See discussion here #2974


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

New or updated recipe/diagnostic

New or updated data reformatting script


To help with the number of pull requests:

@rbeucher rbeucher marked this pull request as draft August 9, 2023 02:36
@rbeucher
Copy link
Contributor Author

rbeucher commented Aug 9, 2023

Hi @bouweandela

So I have encountered several issues with the download from ESGF.

For datasets MODIS-1-0, AIRS-2-1, AIRS-2-0
The code adds the time_frequency facet to the ESGF search query.
For some reason the API fails to return any results in that case:

See code for testing what I mean:

#!/bin/bash

from pyesgf.search import SearchConnection
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search',  distrib=True)
facets='project,source_id,variable,time_frequency'
ctx = conn.new_context(project='obs4MIPs', source_id='MODIS-1-0', facets=facets)
print(ctx.hit_count)
ctx = conn.new_context(project='obs4MIPs', source_id='MODIS-1-0', variable="clt", facets=facets)
print(ctx.hit_count)
ctx = conn.new_context(project='obs4MIPs', source_id='MODIS-1-0', variable="clt", time_frequency="mon", facets=facets)
print(ctx.hit_count)


ctx = conn.new_context(project='obs4MIPs', source_id='AIRS-2-1', facets=facets)
print(ctx.hit_count)
ctx = conn.new_context(project='obs4MIPs', source_id='AIRS-2-1', variable="hus", facets=facets)
print(ctx.hit_count)
ctx = conn.new_context(project='obs4MIPs', source_id='AIRS-2-1', variable="hus", time_frequency="mon", facets=facets)
print(ctx.hit_count)

@rbeucher
Copy link
Contributor Author

rbeucher commented Aug 9, 2023

I can't find RSS-v7 on ESGF

@rbeucher
Copy link
Contributor Author

rbeucher commented Aug 9, 2023

Finally, question regarding derived variables like rsnstcsnorm.
That one depends on CERES-EBAF and CERES-EBAF_Surface
How should I handle this?

@rbeucher
Copy link
Contributor Author

rbeucher commented Aug 9, 2023

Note that I am using a dummy diagnostic. Is there another way?

@bouweandela
Copy link
Member

This is an attempt at a recipe leveraging the ESGF download capability to create an obs4MIPs data pool at NCI.

Note that this will not only download the data, but also create CMORized copies. If this is not needed, I would recommend writing a small script that uses the esmvalcore.esgf module to find and download the required data instead.

@bouweandela
Copy link
Member

For datasets MODIS-1-0, AIRS-2-1, AIRS-2-0
The code adds the time_frequency facet to the ESGF search query.
For some reason the API fails to return any results in that case:

It appears that some obs4MIPs datasets use frequency as a facet while others use time_frequency (see here for an overview of all available facet values). Therefore automatic translation of facets is failing. The obs4MIPs specification says that it should be frequency, but it appears that the reality on ESGF is different.

I'm not entirely sure how to solve this, we could consider removing this line from our code, but that may cause trouble for datasets that provide the same variable in multiple frequencies.

@bouweandela
Copy link
Member

I can't find RSS-v7 on ESGF

Maybe it is related to the issue with frequency? When I run the following Python code I can find it:

In [1]: from esmvalcore.esgf import find_files

In [2]: find_files(project='obs4MIPs', dataset='RSS-v7', short_name='*')
Out[2]: 
[ESGFFile:obs4MIPs/RSS-v7/v20180305/prw_mon_RSS-v7_BE_gn_198801_201512.nc on hosts ['aims3.llnl.gov'],
 ESGFFile:obs4MIPs/RSS-v7/v20180305/sfcWind_mon_RSS-v7_BE_gn_198801_201512.nc on hosts ['aims3.llnl.gov'],
 ESGFFile:obs4MIPs/RSS-v7/v20180305/tos_mon_RSS-v7_BE_gn_200206-201012.nc on hosts ['aims3.llnl.gov']]

or see here for some results in json format.

@bouweandela
Copy link
Member

Finally, question regarding derived variables like rsnstcsnorm.
That one depends on CERES-EBAF and CERES-EBAF_Surface
How should I handle this?

I suspect that mixing different datasets is not supported by the tool. You could try using dataset: [CERES-EBAF, CERES-EBAF_Surface] in the recipe, but I'm not sure if that works.

@bouweandela
Copy link
Member

Note that I am using a dummy diagnostic. Is there another way?

Yes, use diagnostics: null in the recipe if you only want to preprocess the data.

@remi-kazeroni
Copy link
Contributor

Thanks for initiating this, @rbeucher. I support the idea of having a recipe to check the completeness of a shared obs4MIPs data pool on clusters. I was wondering if it wouldn't be simpler to replicate what we do for CMORized data in the recipe_check_obs.yml but for obs4MIPs data used in our recipes, e.g. recipe_check_obs4mips.yml. This recipe could be run to check if data are available locally and if not, automatically downloaded with the auto-download capability of the Core for ESGF data.
In this respect, I think it could be simpler to organize the recipe first by dataset (used as diagnostic names) and then by variables. This would allow users to only run the recipe for one dataset if needed.

Here is an example:

  CERES-EBAF:
    description: CERES-EBAF check
    variables:
      rlut:
      rlutcs:
    additional_datasets:
      - {dataset: CERES-EBAF, project: obs4MIPs, mip: Amon, tier: 1,
         start_year: 2000, end_year: 2014}
    scripts: null

When ran with esmvaltool run recipe_check_obs4mips.yml --diagnostics=CERES-EBAF, it downloaded:

├── obs4MIPs
│   └── CERES-EBAF
│       └── v20160610
│           ├── rlut_CERES-EBAF_L3B_Ed2-8_200003-201404.nc
│           └── rlutcs_CERES-EBAF_L3B_Ed2-8_200003-201404.nc

@rbeucher rbeucher closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants