Skip to content

Calibration and cross validation

miturbide edited this page Jun 20, 2018 · 15 revisions

This is a worked example of the application of several bias correction methods. In order to show the way in which each method operates, in the next examples the correction is applied considering the same period for the train and test data (1983-2002), however, usually RCM or GCM are used for test.

The example will be done on the NCEP/NCAR Reanalysis 1 dataset and the VALUE station dataset.

Instead of station data, the observations might also be gridded data.

# If we have not installed the "devtools" library we should do it:
# install.packages("devtools")
# library(devtools)
# Installing the downscaleR R-package
# devtools::install_github("SantanderMetGroup/downscaleR")
library(downscaleR)

Data

Observations

In this example we will bias correct winter precipitation and mean temperature for the Igueldo station, using as observations data from the VALUE_ECA&D dataset, considering the period 1983-2002 (type help(VALUE_Iberia_pr) and help(VALUE_Iberia_tas)).

# precipitation
data(VALUE_Iberia_pr)
y <- VALUE_Iberia_pr
# temperature
data(VALUE_Iberia_tas)
y.t <- VALUE_Iberia_tas

Predictors

NCEP reanalysis model data will be used as predictors and test data (type help(NCEP_Iberia_pr) and help(NCEP_Iberia_tas)). In order to transform NCEP precipitation units from Kg/m2/s to mm, we use function gridArithmetics.

# precipitation
data(NCEP_Iberia_pr)
x <- gridArithmetics(NCEP_Iberia_pr, 86400, operator = "*")

# temperature
data(NCEP_Iberia_tas)
x.t <- NCEP_Iberia_tas

Next the climatology of NCEP precipitation (x) and the VALUE stations (y) are plotted with spatialPlot (from package visualizeR). In order to visualize the VALUE stations in the same map, we might use the functionalities of package sp (here function SpatialPoints).

library(visualizeR)
library(sp)
spatialPlot(climatology(x), backdrop.theme = "coastline", 
            sp.layout = list(list(SpatialPoints(getCoordinates(y)), 
                                  pch = 17, col = "black", cex = 1.5)))

Application of different bias correction methods

Methods

The available properties to define the methods are:

-method: "delta", "scaling", "eqm", "pqm" and "gpqm. See the examples below for further details in each method.

-cross.val: For performing optional cross-validation (leave-one-year-out or k-fold).

-window: Numeric value specifying the time window width used to calibrate. The window is centered on the target days. Default to NULL, which considers the whole period available.

-scaling.type: Character indicating the type of the scaling method. Options are "additive" or "multiplicative". This argument is ignored if "scaling" is not selected as the bias correction method.

-fitdist.args: Further arguments passed to function fitdistr (densfun, start, ...). Only used when applying the "pqm" method (parametric quantile mapping).

-wet.threshold: The minimum value that is considered as a non-zero precipitation. Ignored for varcode values different from "pr".

-n.quantiles: Quantiles to be considered when method = "eqm".

-extrapolation: Character indicating the extrapolation method to be applied to correct test values that are out of the range of train values. Extrapolation is applied only to the "eqm" method. Options are "no" (Default: do not extrapolate) and "constant".

-theta (only for method 'gpqm'): upper threshold (and lower for the left tail of the distributions, if needed) above which values are fitted to a Generalized Pareto Distribution (GPD). Values below this threshold are fitted to a gamma (normal) distribution. By default, 'theta' is the 95th percentile (5th percentile for the left tail).

-join.members: To indicate whether members should be corrected independently or joined.

Type help(biasCorrection) for a detailed description.

Delta

This method consists on adding to the observations the mean change signal (delta method). It is applicable to any kind of variable but it is preferable to avoid it for bounded variables (e.g. precipitation, wind speed, etc.) because values out of the variable range could be obtained (e.g. negative wind speeds...).

cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, method = "delta")
cal.t <- biasCorrection (y = y.t, x = x.t, newdata = x.t, method = "delta")

Once we have calibrated the simulation for the test period, we can validate the result against the observational reference. This can be easily done with function quickDiagnostics that returns two graphs. The first shows time-series of the original simulation (red), the calibrated simulation (blue) and the observation (black). The second graph is the quantile-quantile plot, that better illustrates the effect of the applied method. Here we select the location that corresponds to the Igueldo station (location = c(-2.0392, 43.3075)):

# precipitation
quickDiagnostics(y, x, cal, type = "daily", location = c(-2.0392, 43.3075))

# temperature
quickDiagnostics(y.t, x.t, cal.t, type = "daily", location = c(-2.0392, 43.3075))

Package visualizeR also provides a function for plotting time series. In this case we will plot the time series for the Igueldo station.

# time series plotting for Igueldo station
Igueldo.cal <- subsetGrid(cal, station.id = "000234")
Ig.coords <- getCoordinates(Igueldo.cal)
Igueldo.raw <- subsetGrid(x, lonLim = Ig.coords$x, latLim = Ig.coords$y) 
temporalPlot(Igueldo.cal, Igueldo.raw, cols = c("blue", "red"))

As we can see in the graph, we are working with winter data. If we preffer to visualize these results as continuous series, we can set x.axis = "index":

# time series plotting for Igueldo station
temporalPlot(Igueldo.cal, Igueldo.raw, cols = c("blue", "red"), x.axis = "index")

Multiple grid objects can be passed to function temporalPlot.

Scaling

This method is very similar to the delta method but, in this case, the correction consist on scaling the simulation with the difference (scaling.type = "additive") or quotient (scaling.type = "multiplicative") between the mean of the observations and the simulation in the train period. The additive version is preferably applicable to unbounded variables (e.g. temperature) and the multiplicative to variables with a lower bound (e.g. precipitation, because it also preserves the frequency).

See Wetterhall et al.2012 for a comparison with more sophisticated methods considering the scaling method as a benchmark.

#For precipitation
cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, method = "scaling", 
                       scaling.type = "multiplicative")
#For temperature
cal.t <- biasCorrection(y = y.t, x = x.t, newdata = x.t, method = "scaling",
                         scaling.type = "additive")
# precipitation
quickDiagnostics(y, x, cal, type = "daily", location = c(-2.0392, 43.3075))

# temperature
quickDiagnostics(y.t, x.t, cal.t, type = "daily", location = c(-2.0392, 43.3075))

eQM

The empirical quantile mapping is a very extended bias correction method which consists on calibrating the simulated Cumulative Distribution Function (CDF) by adding to the observed quantiles both the mean delta change and the individual delta changes in the corresponding quantile.

cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, 
                       method = "eqm")
cal.t <- biasCorrection(y = y.t, x = x.t, newdata = x.t, 
                         method = "eqm")
# precipitation
quickDiagnostics(y, x, cal, type = "daily", location = c(-2.0392, 43.3075))

# temperature
quickDiagnostics(y.t, x.t, cal.t, type = "daily", location = c(-2.0392, 43.3075))

PQM

Parametric quantile mapping is based on the initial assumption that both observed and simulated intensity distributions are well approximated by a given distribution, therefore it uses the theorical instead of the empirical distribution. For instance, the use of the "gamma" distribution is described in Piani et al. 2010, and here an example of application:

cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, 
                      method = "pqm", fitdistr.args = list(densfun = "gamma"))

For temperature we use the "normal" distribution:

cal.t <- biasCorrection(y = y.t, x = x.t, newdata = x.t, precipitation = TRUE, 
                      method = "pqm", fitdistr.args = list(densfun = "normal"))
# precipitation
quickDiagnostics(y, x, cal, type = "daily", location = c(-2.0392, 43.3075))

# precipitation
quickDiagnostics(y.t, x.t, cal.t, type = "daily", location = c(-2.0392, 43.3075))

gpQM

This method was proposed by Gutjahr and Heinemann 2013. It is also a parametric quantile mapping based on a gamma and Generalized Pareto Distribution (GPD). By default, the threshold above which the GPD is fitted is the 95th percentile of the observed and the predicted wet-day distribution, respectively. The user can specify a different threshold by modifying the parameter theta, by giving the observational (predicted) threshold in the first (second) row and Nnodes columns.

cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, 
                       method = "gpqm", theta = .70)
# precipitation
quickDiagnostics(y, x, cal, type = "daily", location = c(-2.0392, 43.3075))

Cross validation

k-fold:

# precipitation
cal <- biasCorrection(y = y, x = x,
                            precipitation = TRUE,
                            method = "eqm",
                            window = c(30, 15),
                            wet.threshold = 0.1,
                            cross.val = "kfold",
                            folds = list(1983:1989, 1990:1996, 1997:2002))
# precipitation
quickDiagnostics(y, x, cal, type = "daily", location = c(-2.0392, 43.3075))

leave-one-year-out:

# precipitation
cal <- biasCorrection(y = y, x = x,
                            precipitation = TRUE,
                            method = "eqm",
                            window = c(30, 15),
                            wet.threshold = 0.1,
                            cross.val = "loo")

<-- Home page of the Wiki

Session info

print(sessionInfo(package = c("transformeR", "downscaleR")))

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## character(0)
## 
## other attached packages:
## [1] transformeR_1.3.3 downscaleR_3.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.15        compiler_3.4.3      highr_0.6          
##  [4] methods_3.4.3       bitops_1.0-6        iterators_1.0.8    
##  [7] utils_3.4.3         tools_3.4.3         grDevices_3.4.3    
## [10] deepnet_0.2         digest_0.6.13       dotCall64_0.9-5.2  
## [13] evd_2.3-2           gtable_0.2.0        evaluate_0.10.1    
## [16] lattice_0.20-35     Matrix_1.2-7.1      foreach_1.4.3      
## [19] yaml_2.1.16         parallel_3.4.3      spam_2.1-2         
## [22] akima_0.6-2         gridExtra_2.2.1     stringr_1.2.0      
## [25] knitr_1.18          raster_2.6-7        gridGraphics_0.2   
## [28] graphics_3.4.3      datasets_3.4.3      stats_3.4.3        
## [31] fields_9.6          maps_3.2.0          rprojroot_1.3-2    
## [34] grid_3.4.3          glmnet_2.0-13       base_3.4.3         
## [37] rmarkdown_1.8       sp_1.2-7            magrittr_1.5       
## [40] backports_1.1.2     codetools_0.2-15    htmltools_0.3.6    
## [43] MASS_7.3-44         abind_1.4-5         stringi_1.1.5      
## [46] RCurl_1.95-4.10     RcppEigen_0.3.3.3.1
Clone this wiki locally