Skip to content

Latest commit

 

History

History
278 lines (185 loc) · 9.45 KB

README.md

File metadata and controls

278 lines (185 loc) · 9.45 KB

RDDtools: an R package for Regression Discontinuity Design

RDDtools is a new R package under development, designed to offer a set of tools to run all the steps required for a Regression Discontinuity Design (RDD) Analysis, from primary data visualisation to discontinuity estimation, sensitivity and placebo testing.

Installing RDDtools

This github website hosts the source code. One of the easiest ways to install the package from github is by using the R package devtools:

library(devtools)
install_github(repo = "RDDtools", username = "MatthieuStigler", subdir = "RDDtools")

Note however the latest version of RDDtools only works with R 3.0, and that you might need to install Rtools if on Windows.

Documentation

The (preliminary) documentation is available in the help files directly, as well as in the vignette. The vignette can be accessed from R with vignette("RDDtools"), or by accessing the pdf stored on this github.

RDDtools: main features

  • Simple visualisation of the data using binned-plot: plot()

  • Bandwidth selection:

  • Estimation:

    • RDD parametric estimation: RDDreg_lm() This includes specifying the polynomial order, including covariates with various specifications as advocated in Imbens and Lemieux 2008.
    • RDD local non-parametric estimation: RDDreg_np(). Can also include covariates, and allows different types of inference (fully non-parametric, or parametric approximation).
    • RDD generalised estimation: allows to use custom estimating functions to get the RDD coefficient. Could allow for example a probit RDD, or quantile regression.
  • Post-Estimation tools:

    • Various tools, to obtain predictions at given covariate values ( RDDpred() ), or to convert to other classes, to lm ( as.lm() ), or to the package np ( as.npreg() ).
    • Function to do inference with clustered data: clusterInf() either using a cluster covariance matrix ( vcovCluster() ) or by a degrees of freedom correction (as in Cameron et al. 2008).
  • Regression sensitivity analysis:

    • Plot the sensitivity of the coefficient with respect to the bandwith: plotSensi()
    • Placebo plot using different cutpoints: plotPlacebo()
  • Design sensitivity analysis:

    • McCrary test of manipulation of the forcing variable: wrapper dens_test() to the function DCdensity() from package rdd.
    • Test of equal means of covariates: covarTest_mean()
    • Test of equal density of covariates: covarTest_dens()
  • Datasets

Using RDDtools: a quick example

RDDtools works in an object-oriented way: the user has to define once the characteristic of the data, creating a RDDdata object, on which different anaylsis tools can be applied.

Data preparation and visualisation

Load the package, and load the built-in dataset from Lee 2008:

library(RDDtools)
data(Lee2008)

Declare the data to be a RDDdata object:

Lee2008_rdd <- RDDdata(y = Lee2008$y, x = Lee2008$x, cutpoint = 0)

You can now directly summarise and visualise this data:

summary(Lee2008_rdd)
## ### RDDdata object ###
## 
## Cutpoint: 0 
## Sample size: 
## 	-Full : 6558 
## 	-Left : 2740 
## 	-Right: 3818
## Covariates: no
plot(Lee2008_rdd)

plot of chunk dataPlot

Estimation

Parametric

Estimate parametrically, by fitting a 4th order polynomial:

reg_para <- RDDreg_lm(RDDobject = Lee2008_rdd, order = 4)
reg_para
## ### RDD regression: parametric ###
## 	Polynomial order:  4 
## 	Slopes:  separate 
## 	Number of obs: 6558 (left: 2740, right: 3818)
## 
## 	Coefficient:
##   Estimate Std. Error t value Pr(>|t|)    
## D   0.0766     0.0132    5.79  7.6e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_para)

plot of chunk reg_para

Non-parametric

As well as run a simple local regression, using the Imbens and Kalyanaraman 2012 bandwidth:

bw_ik <- RDDbw_IK(Lee2008_rdd)
reg_nonpara <- RDDreg_np(RDDobject = Lee2008_rdd, bw = bw_ik)
print(reg_nonpara)
## ### RDD regression: nonparametric local linear###
## 	Bandwidth:  0.2939 
## 	Number of obs: 3200 (left: 1594, right: 1606)
## 
## 	Coefficient:
##   Estimate Std. Error z value Pr(>|z|)    
## D  0.07992    0.00946    8.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(x = reg_nonpara)

plot of chunk RegPlot

Regression Sensitivity tests:

One can easily check the sensitivity of the estimate to different bandwidths:

plotSensi(reg_nonpara, from = 0.05, to = 1, by = 0.1)

plot of chunk SensiPlot

Or run the Placebo test, estimating the RDD effect based on fake cutpoints:

plotPlacebo(reg_nonpara)

plot of chunk placeboPlot

Design Sensitivity tests:

Design sensitivity tests check whether the discontinuity found can actually be attributed ot other causes. Two types of tests are available:

  • Discontinuity comes from manipulation: test whether there is possible manipulation around the cutoff, McCrary 2008 test: dens_test()
  • Discontinuity comes from other variables: should test whether discontinuity arises with covariates. Currently, only simple tests of equality of covariates around the threshold are available:

Discontinuity comes from manipulation: McCrary test

use simply the function dens_test(), on either the raw data, or the regression output:

dens_test(reg_nonpara)

plot of chunk DensPlot

## 
## 	McCrary Test for no discontinuity of density around cutpoint
## 
## data:  reg_nonpara
## z-val = 1.295, p-value = 0.1952
## alternative hypothesis: Density is discontinuous around cutpoint
## sample estimates:
## Discontinuity 
##        0.1035

Discontinuity comes from covariates: covariates balance tests

Two tests available:

  • equal means of covariates: covarTest_mean()
  • equal density of covariates: covarTest_dens()

We need here to simulate some data, given that the Lee (2008) dataset contains no covariates. We here simulate three variables, with the second having a different mean on the left and the right.

set.seed(123)
n_Lee <- nrow(Lee2008)
Z <- data.frame(z1 = rnorm(n_Lee, sd = 2), z2 = rnorm(n_Lee, mean = ifelse(Lee2008 < 
    0, 5, 8)), z3 = sample(letters, size = n_Lee, replace = TRUE))
Lee2008_rdd_Z <- RDDdata(y = Lee2008$y, x = Lee2008$x, covar = Z, cutpoint = 0)

Run the tests:

## test for equality of means around cutoff:
covarTest_mean(Lee2008_rdd_Z, bw = 0.3)
##    mean of x mean of y Difference statistic p.value
## z1 0.004268  0.02186   0.01759    -0.2539   0.7996 
## z2 5.006     7.985     2.979      -84.85    0      
## z3 13.19     13.44     0.2465     -0.941    0.3468
## Can also use function covarTest_dis() for Kolmogorov-Smirnov test:
covarTest_dis(Lee2008_rdd_Z, bw = 0.3)
##    statistic p.value
## z1 0.03482   0.2727 
## z2 0.8648    0      
## z3 0.03009   0.4474

Tests correctly reject equality of the second, and correctly do not reject equality for the first and third.