Ouranosinc · richardarsenault · Dec 15, 2021 · Nov 29, 2021 · Nov 30, 2021 · Nov 30, 2021
diff --git a/docs/source/notebooks/09_Hydrological_impacts_of_climate_change.ipynb b/docs/source/notebooks/09_Hydrological_impacts_of_climate_change.ipynb
@@ -0,0 +1,389 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 09_Hydrological_impacts_of_climate_change.ipynb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Performing bias correction on climate model data to perform climate change impact studies on hydrology\n",
+    "\n",
+    "This notebook will guide you on how to conduct bias correction of climate model outputs that will be fed as inputs to the hydrological model `Raven` to perform climate change impact studies on hydrology. \n",
+    "\n",
+    "In this tutorial, we will be using the shapefile or GeoJSON file for watershed contours as generated in the tutorial notebooks #03 (03_Extract_geographical_watershed_properties.ipynb) and #04 (04_Delineating_watersheds.ipynb). The file can be uploaded to your workspace here and used directly in the cells below. The workflow here is a bit longer than previous ones because we integrate the climate change scenario bias-correction and the hydrological modelling that follows into a single coherent workflow. We will mainly use the GR4JCN hydrological model, but towards the end of this notebook, we also show how to use other models. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "'''\n",
+    "Import the required packages\n",
+    "'''\n",
+    "import datetime as dt\n",
+    "import gcsfs    # For the CMIP6 data on PanGEO\n",
+    "import s3fs     # For the ERA5 data on AmazonS3 / Wasabi\n",
+    "import xarray as xr\n",
+    "import intake   # Must include intake-xarray as a plugin\n",
+    "import zarr\n",
+    "import fsspec\n",
+    "import os\n",
+    "import xclim.sdba as sdba\n",
+    "import json\n",
+    "from clisops.core import subset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Get the CMIP6 climate data that we need, for both the reference and future period.\n",
+    "\n",
+    "To begin, we need to access climate change data. We prefer the more recent CMIP6 project model scenarios, so we will use those hosted on Google Cloud Services and made available by Pangeo.\n",
+    "We will prepare the access point and extract minimum, maximum temperatures as well as precipitation for a 10-year period. Typically we should use a 30 year period for climate change projections but we are shortening it to 10 years to make computation time faster for this example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Here we are going to use real data. It will take a bit of time as \n",
+    "# the data is huge and we need to access it and download it locally.\n",
+    "# It will be longer for longer time periods and larger catchments!\n",
+    "\n",
+    "'''\n",
+    "Providing the information we want to use for our analysis.\n",
+    "'''\n",
+    "# Define climate model to use. See the complete list in the \"07_Bias_correction_of_CMIP6_data.ipynb\" notebook.\n",
+    "climate_model = 'MIROC6'\n",
+    "\n",
+    "# The catchment boundaries file. Can be generated using notebook \"04_Delineating watersheds\", use the one for your own catchment!\n",
+    "basin_contour = 'input.geojson' \n",
+    "\n",
+    "# The start and end dates for the reference and future time periods.\n",
+    "reference_start_day = dt.datetime(1980, 12, 30)\n",
+    "reference_stop_day = dt.datetime(1991, 1, 2) # Notice we are using 2 days before and 2 days after the desired period of 1981-01-01 to 1990-12-31. This is to account for any UTC shifts that might require getting data in a previous or later time.\n",
+    "future_start_day = dt.datetime(2080, 12, 30)\n",
+    "future_stop_day = dt.datetime(2091, 1, 2) # Notice we are using 2 days before and 2 days after the desired period of 2081-01-01 to 2090-12-31. This is to account for any UTC shifts that might require getting data in a previous or later time.\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Prepare the filesystem that allows reading CMIP6 data. Data is read on the Google Cloud Services, \n",
+    "which host a copy of the CMIP6 (and other) data. \n",
+    "'''\n",
+    "fsCMIP = gcsfs.GCSFileSystem(token='anon', access='read_only')\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Get the catalog info from the pangeo dataset, which basically is a list of links to the various products.\n",
+    "'''\n",
+    "col = intake.open_esm_datastore(\"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\")\n",
+    "\n",
+    "\n",
+    "'''\n",
+    "Load the files from the PanGEO catalogs, for reference and future variables of temperature and precipitation.\n",
+    "'''\n",
+    "# Reference period (named 'historical' in CMIP6)\n",
+    "query = dict(experiment_id=['historical'],table_id='day',variable_id=['tasmin', 'tasmax', 'pr'],member_id='r1i1p1f1',source_id=climate_model)\n",
+    "col_subset = col.search(require_all_on=['source_id', 'variable_id'], **query)\n",
+    "dd = col_subset.to_dataset_dict(zarr_kwargs={\"consolidated\":True})\n",
+    "ds_reference = subset.subset_shape(dd[list(dd.keys())[0]].sel(time=slice(reference_start_day,reference_stop_day)),'input.geojson').mean({'lat','lon'}).chunk(-1)\n",
+    "ds_reference=ds_reference.isel(member_id=0).drop({'member_id','height'})\n",
+    "\n",
+    "# Get the variables we need from the full dataset\n",
+    "historical_pr = ds_reference['pr']\n",
+    "historical_tasmin = ds_reference['tasmin']\n",
+    "historical_tasmax = ds_reference['tasmax']\n",
+    "\n",
+    "\n",
+    "# Future period\n",
+    "query = dict(experiment_id=['ssp585'],table_id='day',variable_id=['tasmin', 'tasmax', 'pr'],member_id='r1i1p1f1',source_id=climate_model)\n",
+    "col_subset = col.search(require_all_on=['source_id', 'variable_id'], **query)\n",
+    "dd = col_subset.to_dataset_dict(zarr_kwargs={\"consolidated\":True})\n",
+    "ds_future = subset.subset_shape(dd[list(dd.keys())[0]].sel(time=slice(future_start_day,future_stop_day)),'input.geojson').mean({'lat','lon'}).chunk(-1)\n",
+    "ds_future = ds_future.isel(member_id=0).drop({'member_id','height'})\n",
+    "\n",
+    "# Get the variables we need from the full dataset\n",
+    "future_pr = ds_future['pr']\n",
+    "future_tasmin = ds_future['tasmin']\n",
+    "future_tasmax = ds_future['tasmax']\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Now that we have the climate scenarios, we need a historical dataset or observations to perform the bias correction.\n",
+    "In this tutorial, we will use ERA5 realaysis data to make things easier and ensure data is available for the entire world with no missing data. If you have your own dataset, you can provide it instead, as long as it has the precipitation, maximum temperatures and minimum temperatures at the daily time step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the ERA5 data from the Wasabi/Amazon S3 server. \n",
+    "# Will eventually be replaced by the more efficient direct call with auto-updating timesteps.\n",
+    "# Future code:\n",
+    "\n",
+    "# catalog_name = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/atmosphere.yaml'\n",
+    "# cat=intake.open_catalog(catalog_name)\n",
+    "# ds=cat.era5_hourly_reanalysis_single_levels_ts.to_dask()\n",
+    "\n",
+    "# For now, let's use this method:\n",
+    "''' \n",
+    "Configuration keys. Boilerplate, should not be changed.\n",
+    "'''\n",
+    "CLIENT_KWARGS = {'endpoint_url': 'https://s3.wasabisys.com','region_name': 'us-east-1'}\n",
+    "CONFIG_KWARGS = {'max_pool_connections': 100}\n",
+    "STORAGE_OPTIONS = {'anon': True,'client_kwargs': CLIENT_KWARGS,'config_kwargs': CONFIG_KWARGS}\n",
+    "\n",
+    "'''\n",
+    "Prepare the filesystem and mapper that points to the data itself on the AmazonS3 directory\n",
+    "'''\n",
+    "fsERA5 = fsspec.filesystem('s3', **STORAGE_OPTIONS)\n",
+    "mapper = fsERA5.get_mapper('s3://era5/world/reanalysis/single-levels/zarr-temporal/2021-06-30')\n",
+    "\n",
+    "'''\n",
+    "Get the ERA5 data (for the same reference period)\n",
+    "'''\n",
+    "ds = xr.open_zarr(mapper, consolidated=True).sel(time=slice(reference_start_day,reference_stop_day))\n",
+    "\n",
+    "# Extract the data for the region of interest only, and transform the hourly data to daily\n",
+    "# Also rechunk to make it faster to read.\n",
+    "sub=subset.subset_shape(ds,basin_contour).mean({'latitude','longitude'})\n",
+    "ERA5_tmin=sub['t2m'].resample(time='1D').min().chunk(-1,-1,-1)\n",
+    "ERA5_tmax=sub['t2m'].resample(time='1D').max().chunk(-1,-1,-1)\n",
+    "ERA5_pr = sub['tp'].resample(time='1D').mean().chunk(-1,-1,-1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Start the bias-correction!\n",
+    "\n",
+    "Here we use the `xclim` utilities to get the data by month. See `xclim` documentation for more options! (https://xclim.readthedocs.io/en/stable/notebooks/sdba.html)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "'''\n",
+    "Load the model to use for the bias correction - the `detrended quantile mapping`.\n",
+    "'''\n",
+    "\n",
+    "# Use xclim utilities (sbda)\n",
+    "group_month_window = sdba.utils.Grouper(\"time.dayofyear\", window=15)\n",
+    "\n",
+    "# This is an adjusting function. It build the tool that will perform the corrections. See 'xclim\" documentation for many more options!\n",
+    "Adjustment = sdba.DetrendedQuantileMapping(\n",
+    "    nquantiles=50, \n",
+    "    kind=\"+\", \n",
+    "    group=group_month_window\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The model is now initialized and ready to find correction factors between the reference dataset (observations) and historical dataset (climate model outputs for the same time period). The correction factors obtained are then applied to both reference and future climate outputs to correct them. This step is called the bias correction step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Train the model to find the correction factors for the precipitation (pr) data\n",
+    "Adjustment.train(ERA5_pr,historical_pr)\n",
+    "\n",
+    "# Apply the correction factors on the reference period\n",
+    "corrected_ref_precip = Adjustment.adjust(historical_pr, interp=\"linear\")\n",
+    "\n",
+    "# Apply the correction factors on the future period\n",
+    "corrected_fut_precip = Adjustment.adjust(future_pr, interp=\"linear\")\n",
+    "\n",
+    "# Ensure that the precipitation is non-negative, which can happen with some climate models\n",
+    "corrected_ref_precip[corrected_ref_precip < 0] = 0\n",
+    "corrected_fut_precip[corrected_fut_precip < 0] = 0\n",
+    "\n",
+    "# Train the model to find the correction factors for the maximum temperature (tasmax) data\n",
+    "Adjustment.train(ERA5_tmax,historical_tasmax)\n",
+    "\n",
+    "# Apply the correction factors on the reference period\n",
+    "corrected_ref_tasmax = Adjustment.adjust(historical_tasmax, interp=\"linear\")\n",
+    "\n",
+    "# Apply the correction factors on the future period\n",
+    "corrected_fut_tasmax = Adjustment.adjust(future_tasmax, interp=\"linear\")\n",
+    "\n",
+    "# Train the model to find the correction factors for the minimum temperature (tasmin) data\n",
+    "Adjustment.train(ERA5_tmin,historical_tasmin)\n",
+    "\n",
+    "# Apply the correction factors on the reference period\n",
+    "corrected_ref_tasmin = Adjustment.adjust(historical_tasmin, interp=\"linear\")\n",
+    "\n",
+    "# Apply the correction factors on the future period\n",
+    "corrected_fut_tasmin = Adjustment.adjust(future_tasmin, interp=\"linear\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### The corrected reference and future data are then converted to netCDF files to be used by our 'Raven' hydrological models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Convert the reference corrected data into netCDF file\n",
+    "ref_dataset = xr.merge([\n",
+    "    corrected_ref_precip.to_dataset(name=\"pr\"),\n",
+    "    corrected_ref_tasmax.to_dataset(name=\"tasmax\"), \n",
+    "    corrected_ref_tasmin.to_dataset(name=\"tasmin\")\n",
+    "])\n",
+    "ref_dataset.to_netcdf(\"reference_dataset.nc\")\n",
+    "\n",
+    "# Convert the future corrected data into netCDF file\n",
+    "fut_dataset = xr.merge([\n",
+    "    corrected_fut_precip.to_dataset(name=\"pr\"),\n",
+    "    corrected_fut_tasmax.to_dataset(name=\"tasmax\"), \n",
+    "    corrected_fut_tasmin.to_dataset(name=\"tasmin\")\n",
+    "])\n",
+    "fut_dataset.to_netcdf(\"future_dataset.nc\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Our data is ready. Let's do some hydrological modelling to see how the future streamflow might look like compared to the reference streamflow!\n",
+    "\n",
+    "In this step, we will take the reference period climate data and run the GR4JCN hydrological model with it. We will then plot a graph to see the streamflow representative of the historical period."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Need to import the GR4JCN hydrological model instance\n",
+    "from ravenpy.models import GR4JCN\n",
+    "\n",
+    "# Start a model instance, in this case a GR4JCN model emulator.\n",
+    "model = GR4JCN()\n",
+    "\n",
+    "# Path to the climate data needed to run the Raven model on the reference period.\n",
+    "# All files in your current workspace (the \"writable-workspace\") can be accessed by adding the \"/notebook_dir/writable-workspace/\" prefix in the path.\n",
+    "forcing='/notebook_dir/writable-workspace/reference_dataset.nc'\n",
+    "\n",
+    "# Model configuration parameters that will be used for the simulation for both the reference and future periods.\n",
+    "common_run_parameters = dict(\n",
+    "    area=4250.6,\n",
+    "    elevation=843.0,\n",
+    "    latitude=54.4848,\n",
+    "    longitude=-123.3659,\n",
+    "    params=(0.529, -3.396, 407.29, 1.072, 16.9, 0.947),\n",
+    "    tasmax={\"offset\": -273.15}, # Transforms that are 2-parameters of a linear equation ax + b, so temperature uses a=1.0 and b = -273.15 to bring K to degC.\n",
+    "    tasmin={\"offset\": -273.15}, # Transforms that are 2-parameters of a linear equation ax + b, so temperature uses a=1.0 and b = -273.15 to bring K to degC.\n",
+    "    pr={\"scale\": 86400.0}, # Transforms that are 2-parameters of a linear equation ax + b, so temperature uses a=86400 and b = 0 to bring mm/s to mm/d.\n",
+    "    rain_snow_fraction=\"RAINSNOW_DINGMAN\", \n",
+    ")\n",
+    "\n",
+    "# Run the hydrological model using our forcing data from the reference period, and with the common parameters that will also be used for the future period simulation.\n",
+    "model(\n",
+    "    forcing,\n",
+    "    start_date=dt.datetime(1981,1,1),\n",
+    "    end_date=dt.datetime(1990,12,31),\n",
+    "    **common_run_parameters,\n",
+    "     )\n",
+    "\n",
+    "# Now plot the results\n",
+    "model.hydrograph.q_sim.plot()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Now do the same but for the future period!\n",
+    "We will copy the block of code from above, changing only the file path (from reference dataset to future dataset) as well as the start and end dates."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now redo the same thing but for the future period.\n",
+    "\n",
+    "# Start a new model instance\n",
+    "model = GR4JCN()\n",
+    "params=(0.529, -3.396, 407.29, 1.072, 16.9, 0.947)\n",
+    "\n",
+    "# Accessing the future data we just created -- notice it is now the \"future_dataset.nc\" instead of the \"reference_dataset.nc\"\n",
+    "forcing='/notebook_dir/writable-workspace/future_dataset.nc'\n",
+    "\n",
+    "# Model configuration parameters\n",
+    "model(forcing,\n",
+    "    start_date=dt.datetime(2081,1,1), # Now running the future period!\n",
+    "    end_date=dt.datetime(2090,12,31),\n",
+    "    **common_run_parameters,   \n",
+    ")\n",
+    "\n",
+    "# And plot the results! We can compare both figures and see the changes between both.\n",
+    "model.hydrograph.q_sim.plot()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## You have just generated streamflows for the reference and future periods!\n",
+    "We can analyze these hydrographs with many tools in PAVICS-Hydro, or export them to use elsewhere, or use them as inputs to another process!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}