Skip to content

Commit

Permalink
Merge pull request #139 from ImperialCollegeLondon/feature/climate_da…
Browse files Browse the repository at this point in the history
…ta_doc

data_preprocessing ERA5 and CDS examples
  • Loading branch information
vgro authored Jan 10, 2023
2 parents 9b2fb9b + d924bf8 commit 40983f8
Show file tree
Hide file tree
Showing 6 changed files with 819 additions and 477 deletions.
165 changes: 165 additions & 0 deletions docs/source/data_recipes/CDS_toolbox_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
jupytext:
cell_metadata_filter: -all
formats: md:myst
main_language: python
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.13.8
kernelspec:
display_name: vr_python3
language: python
name: vr_python3
---

# Climate data download from the COPERNICUS Climate data store and CDS toolbox

The atmospheric variables from regional climate models or observations are typically
provided in spatial and temporal resolutions that are different from the requirements
of the Virtual Rainforest. This document describes how to download climate data from
the Copernicus [Climate Data Store](https://cds.climate.copernicus.eu/) (CDS) and basic
pre-processing options using the
[CDS toolbox](https://cds.climate.copernicus.eu/cdsapp#!/toolbox).
At present, the pre-processing does not include scaling or topographic adjustment.

NOTE: You need to create a user account to access all data and functionalities.

## Climate input variables

The abiotic module of the virtual rainforest requires the following climate input
variables (or derivatives) at each time step (default: monthly means):

* Air temperature (typically 2m; mean, minimum, and maximum)
* Air humidity (typically 2m; relative or specific humidity)
* Air pressure (typically mean sealevel or surface pressure)
* Wind speed (typically 10m)
* Precipitation
* Top of atmosphere short-wave downward radiation
* CO2 concentration (for future projections)
* optional: soil temperature and soil moisture for initialisation

## Recommended data sets

We recommend the following data sets to force the virtual rainforest microclimate
simulations:

* ERA5

ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for
the past 4 to 7 decades. This reanalysis dataset combines model data with
observations from across the world into a globally complete and consistent dataset
using the laws of physics. The data is available in hourly and monthly averaged time
steps at a spatial resolution is in 0.25 x 0.25 deg resolution. The data set starts
in 1950 and is updated regularely.

The full documentation and download link can be accessed
[here for hourly data](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview)
and [here for monthly data](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=overview)

* WFDE5

This global dataset provides bias-corrected reconstruction of near-surface
meteorological variables derived from the fifth generation of the European Centre for
Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyses (ERA5). The output is
available in hourly and daily time steps for the period 1979-2019 in 0.5 x 0.5 deg
resolution.

The full documentation and download link can be accessed [here](https://cds.climate.copernicus.eu/cdsapp#!/dataset/derived-near-surface-meteorological-variables?tab=overview).

* CORDEX-SEA

This data set was created with regional climate models (RCM) as part of the
Coordinated Regional Climate Downscaling Experiment (CORDEX). The spatial
resolution is 0.22 x 0.22 deg, the spatial extent is 15°S to 27°N and 89 to 146°E,
the temporal resolution depends on the selected period:
* historical data (1950-2005) is available in hourly time step
* scenario data (2006-2100; RCP 2.6, 4.5 and 8.5) is available in daily time step

The full documentation and download link can be accessed [here](https://cds.climate.copernicus.eu/cdsapp#!/dataset/projections-cordex-domains-single-levels?tab=overview).

* Atmospheric CO2

Observed global CO2 levels (Mauna Loa, NOAA/GML) are available in monthly or annual
resolution (1958 - present) [here](https://gml.noaa.gov/ccgg/trends/graph.html).
Monthly data derived from satellite observation (2002 - present) is available
[here](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-carbon-dioxide?tab=overview)
. Alternatively, reconstructed gridded monthly CO2 data for the historical period
(1953 - 2013) and future CMIP6 scenarios (2015 - 2150) can be downloaded
[here](https://zenodo.org/record/5021361){cite:p}`cheng_wei_2021`.

## Step-by-step example

Follow one of the links above to access overview information about the data set. You
find a detailed documentation of the data set in the 'Documentation' section. To select
data, navigate to the tab 'Download Data'.

### Selection

This is an example of a selection of tabs to download historical '2m air temperature'
from the CORDEX-SEA:

* Domain (South-East Asia),
* Experiment (here: 'historical', RCPs available)
* Horizontal resolution ('0.22 degree x 0.22 degree')
* Temporal resolution ('daily mean')
* Variables (here: '2m_air_temperature')
* Global climate model (here: 'mohc_hadgem2_es')
* Regional climate model (here: 'gerics_remo2015')
* Ensemble member (r1i1p1)
* Start year and End year (here: 2001-2005)

Once you selected the data, you can either download the dataset for further processing,
[see here](./ERA5_preprocessing_example.md) an example of how to manipulate ERA5 data
using xarray, or click on 'show Toolbox request' at the bottom of the page, copy the
code, and open the CDS toolbox editor.

### Toolbox template CORDEX-SEA

The template below describes how to request a data set, reproject the data on a regular
grid (note that the projection name is not changed!), select the area of interest,
calculate the monthly means, and download the product. For illustration, the routine
also plots the mean value. Adjust the 'data' lines to match your data request. You find
the full documentation of the CDS toolbox [here](https://cds.climate.copernicus.eu/toolbox/doc/index.html).

```{code-block} ipython
# EXAMPLE CODE to preprocess CORDEX-SEA with CDS toolbox
import cdstoolbox as ct
@ct.application(title='Download data')
@ct.output.download()
@ct.output.figure()
def download_application():
data =ct.catalogue.retrieve(
'projections-cordex-domains-single-levels',
{
'domain': 'south_east_asia',
'experiment': 'historical',
'horizontal_resolution': '0_22_degree_x_0_22_degree',
'temporal_resolution': 'daily_mean',
'variable': '2m_air_temperature',
'gcm_model': 'mohc_hadgem2_es',
'rcm_model': 'gerics_remo2015',
'ensemble_member': 'r1i1p1',
'start_year': '2001',
'end_year': '2005',
}
)
regular = ct.geo.make_regular(data, xref='rlon', yref='rlat')
sel_extent = ct.cube.select(regular, extent=[116., 118, 4., 6.])
monthly_mean = ct.climate.monthly_mean(sel_extent)
average = ct.cube.average(monthly_mean, dim='time')
fig = ct.cdsplot.geomap(average)
return monthly_mean, fig
```

The data handling for simulations is managed by the {mod}`~virtual_rainforest.core.data`
module and the {class}`~virtual_rainforest.core.data.Data` class, which provides the
data loading and storage functions for the Virtual Rainforest. The data system is
extendable to provide support for different file formats and axis validation but that is
beyond the scope of this document.
108 changes: 108 additions & 0 deletions docs/source/data_recipes/ERA5_preprocessing_example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
jupytext:
cell_metadata_filter: -all
formats: md:myst
main_language: python
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.13.8
kernelspec:
display_name: vr_python3
language: python
name: vr_python3
---

# Simple climate data pre-processing example for dummy module

This section illustrates how to perform simple manipulations to adjust ERA5 data to use
in the Virtual Rainforest. This includes reading climate data from netcdf, converting
the data into an input formate that is suitable for the abiotic module (e.g. Kelvin to
Celsius conversion), and writing the output in a new netcdf file. This does not include
scaling or topographic adjustment.

## Dummy data set

Example file: [dummy_climate_data.nc](./dummy_climate_data.nc)

### Metadata

- Reference: Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz
Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A.,
Soci, C., Dee, D., Thépaut, J-N. (2019): ERA5 monthly averaged data on single levels
from 1959 to present. Copernicus Climate Change Service (C3S) Climate Data Store
(CDS). (Accessed on \< DD-MMM-YYYY >), 10.24381/cds.f17050d7

- Product type: Monthly averaged reanalysis

- Variable: 10m wind speed, 2m dewpoint temperature, 2m temperature, Soil temperature
level 1, Surface pressure, TOA incident solar radiation, Total cloud cover, Total
precipitation, Volumetric soil water layer 1

- Year: 2013, 2014

- Month: January, February, March, April, May, June, July, August, September, October,
November, December

- Time: 00:00

- Sub-region extraction: North 6°, West 116°, South 4°, East 118°

- Format: NetCDF (experimental)

## Code example

### 1. Load the data

```{code-cell} ipython3
import xarray as xr
import numpy as np
dataset = xr.open_dataset("./dummy_climate_data.nc")
dataset
```

### 2. Convert temperatures

The standard output unit of ERA5 tempertures is Kelvin which we need to convert into
degree Celsius for the Virtual Rainforest. This includes 2m air temperature, 2m dewpoint
temperature (used to calculate relative humidity in next step), and topsoil temperature.

```{code-cell} ipython3
dataset["t2m_C"] = dataset["t2m"]-273.15 # 2m air temperature
dataset["d2m_C"] = dataset["d2m"]-273.15 # 2m dewpoint temperature
dataset["stl1_C"] = dataset["stl1"]-273.15 # top soil temperature
```

### 3. Calculate relative humidity

Relative humidity (RH) is not a standard output from ERA5 but can be calculated from 2m
dewpoint temperature (DPT) and 2m temperature (T) as follows:

$$ RH = \frac{100\exp(17.625 \cdot DPT)/(243.04+DPT)}
{\exp(17.625 \cdot T)/(243.04+T)}
$$

```{code-cell} ipython3
dataset["rh2m"] = (
100.0
* (np.exp(17.625 * dataset["d2m_C"] / (243.04 + dataset["d2m_C"]))
/ np.exp(17.625 * dataset["t2m_C"] / (243.04 + dataset["t2m_C"])))
)
```

### 4. Clean dataset and save netcdf

```{code-cell} ipython3
dataset_cleaned = dataset.drop_vars(["d2m","t2m","stl1"])
dataset_cleaned
```

Once you confirmed that your dataset is complete and your calculations are correct, save
it as a new netcdf file. This can then be fed into the code data loading system, see
{mod}`~virtual_rainforest.core.data`.

```{code-block} ipython3
dataset_cleaned.to_netcdf("./dummy_climate_data_processed.nc")
```
Binary file added docs/source/data_recipes/dummy_climate_data.nc
Binary file not shown.
10 changes: 10 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,16 @@ team.
development/design/core.md
```

```{eval-rst}
.. toctree::
:maxdepth: 4
:caption: Climate data pre-processing
:hidden:
Download Copernicus data <data_recipes/CDS_toolbox_template.md>
Preprocess Copernicus data <data_recipes/ERA5_preprocessing_example.md>
```

```{eval-rst}
.. toctree::
:maxdepth: 0
Expand Down
46 changes: 46 additions & 0 deletions docs/source/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,50 @@ @article{Metcalfe2015
pages = {155-172},
issn = {1364-8152},
doi = {https://doi.org/10.1016/j.envsoft.2015.06.010}
}

@article{cheng_wei_2021,
author = {Cheng, Wei and
Dan, Li and
Deng, Xiangzheng and
Feng, Jinming and
Wang, Yongli and
Peng, Jing and
Tian, Jing and
Qi, Wei and
Liu, Zhu and
Zheng, Xinqi and
Zhou, Demin and
Jiang, Sijian and
Zhao, Haipeng and
Wang, Xiaoyu},
title = {{Global monthly distributions of atmospheric CO2
concentrations under the historical and future
scenarios}},
month = jun,
year = 2021,
note = {{The data records include 1 file Network Common
Data Form (NetCDF) format for CO2 distributions in
historical period named
CO2\_1deg\_month\_1850-2013.nc, and 8 files NetCDF
format with the naming convention
CO2\_SSP{XYY}\_2015\_2150.nc, where X and YY are the
shared socioeconomic pathway and radiative forcing
level at 2100, respectively, for CO2 distributions
in the future scenarios. Each NetCDF file includes
3 dimensions: time (month of the year expressed as
days since the first day of 1850, n = 1968 and
1632 for the historical and the future,
respectively); latitude (Degrees North of the
equator [cell centres], n = 180); longitude
(Degrees East of the Prime Meridian [cell
centres], n = 360). Each NetCDF file contains a
monthly variable representing mole fraction of
carbon dioxide in air (variable name: value in the
historical file and CO2 in the future scenario
files) with the unit ppm and the 1º × 1º
resolution.}},
publisher = {Zenodo},
doi = {10.5281/zenodo.5021361},
url = {https://doi.org/10.5281/zenodo.5021361}
}
Loading

0 comments on commit 40983f8

Please sign in to comment.