README.Rmd

---
output: github_document
always_allow_html: true
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

<!-- badges: start -->
<!-- [![packageversion](https://img.shields.io/badge/Package%20version-`r version`-orange.svg?style=flat-square)](commits/master) -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
<!-- [![CRAN status](https://www.r-pkg.org/badges/version/fwildclusterboot)](https://CRAN.R-project.org/package=fwildclusterboot) -->
<!-- ![runiverse-package](https://s3alfisc.r-universe.dev/badges/fwildclusterboot) -->
<!-- [![R-CMD-check](https://github.com/s3alfisc/fwildclusterboot/workflows/R-CMD-check/badge.svg)](https://github.com/s3alfisc/fwildclusterboot/actions) -->
<!-- [![Codecov test coverage](https://codecov.io/gh/s3alfisc/fwildclusterboot/branch/master/graph/badge.svg)](https://app.codecov.io/gh/s3alfisc/fwildclusterboot?branch=master) -->
<!-- `r badger::badge_cran_download("fwildclusterboot", "grand-total", "blue")` -->
<!-- `r badger::badge_cran_download("fwildclusterboot", "last-month", "green")` -->
<!-- [![minimal R version](https://img.shields.io/badge/R%3E%3D-4.0.0-6666ff.svg)](https://cran.r-project.org/) -->

[![R-CMD-check](https://github.com/s3alfisc/wildboottestjlr/workflows/R-CMD-check/badge.svg)](https://github.com/s3alfisc/wildboottestjlr/actions)
<!-- badges: end -->

# wildboottestjlr

`wildboottestjlr` ports the functionality of the [WildBootTests.jl](https://github.com/droodman/WildBootTests.jl) package to R via the [JuliaConnectoR](https://github.com/stefan-m-lenz/JuliaConnectoR) package. 
At the moment, it supports the following features of `WildBootTests.jl`:


+ The wild bootstrap for OLS (Wu 1986).
+ The Wild Restricted Efficient bootstrap (WRE) for IV/2SLS/LIML (Davidson and MacKinnon 2010).
+ The subcluster bootstrap (MacKinnon and Webb 2018).
+ Confidence intervals formed by inverting the test and iteratively searching for bounds.
+ Multiway clustering.
+ Arbitrary and multiple linear hypotheses in the parameters.
+ One-way fixed effects.

The following model objects are currently supported: 

+ OLS: `lm` (from stats), `fixest` (from fixest), `felm` from (lfe)
+ IV: `ivreg` (from ivreg). 

In the future, IV methods for `fixest` and `lfe` will be added.


## Installation / Getting Started

`wildboottestjlr` can be installed by running

```{r, eval = FALSE}
library(devtools)
install_github("s3alfisc/wildboottestjlr")
```

You can install Julia by following the steps described here: https://julialang.org/downloads/. 
`WildBootTests.jl` can then be installed via Julia's package management system. 

To install `WildBootTests.jl` and Julia from within R, you can use `wildboottestjlr::wildboottestjlr_setup()`. Via `wildboottestjlr_setup()`, you can install Julia and `WildBootTests.jl` and connect R and Julia. You simply have to follow the instructions printed in the console! 

```{r, eval = FALSE}
library(wildboottestjlr)
wildboottestr_setup()
```

Similarly, you can set the number of Julia threads by running 

```{r, eval = FALSE}
julia_set_ntreads()
```

and following the instructions.

## Example 

`wildboottestjlr's` central function is `boottest()`. Beyond few minor differences, it largely mirrors the `boottest()` function from the [fwildclusterboot](https://github.com/s3alfisc/fwildclusterboot) package.

### The Wild Bootstrap for OLS

```{r, message = FALSE, warning = FALSE}
library(wildboottestjlr)
# set a 'global' seed in the Julia session
set_julia_seed(rng = 12313452435)

data(voters)
library(fixest)
library(lfe)

# estimation via lm(), fixest::feols() or lfe::felm()
lm_fit <- lm(proposition_vote ~ treatment  + log_income, data = voters)
feols_fit <- feols(proposition_vote ~ treatment  + log_income, data = voters)
felm_fit <- felm(proposition_vote ~ treatment  + log_income, data = voters)

boot_lm <- boottest(lm_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)
boot_feols <- boottest(feols_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)
boot_felm <- boottest(felm_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)

# summarize results via summary() method
#summary(boot_lm)

# also possible: use msummary() from modelsummary package
library(modelsummary)
msummary(list(boot_lm, boot_feols, boot_felm), 
        estimate = "{estimate} ({p.value})", 
        statistic = "[{conf.low}, {conf.high}]"
        )  

# plot(boot_lm)
```

### The Wild Bootstrap for IV (WRE)

If `boottest()` is applied based on an object of type `ivreg`, the WRE bootstrap [Davidson & MacKinnon (2010)](https://www.tandfonline.com/doi/abs/10.1198/jbes.2009.07221) is run. 

```{r, message = FALSE, warning = FALSE}
library(ivreg)
data("SchoolingReturns", package = "ivreg")
data <- SchoolingReturns

ivreg_fit <- ivreg(log(wage) ~ education + age + ethnicity + smsa + south + parents14 |
                  nearcollege + age  + ethnicity + smsa + south + parents14,
                data = data)

boot_ivreg <- boottest(object = ivreg_fit, B = 999, param = "education", clustid = "fameducation", type = "webb")

summary(boot_ivreg)
```

## Benchmarks

After compilation, `wildboottestjlr` is orders of magnitude faster than `fwildclusterboot`, in particular when the number of clusters N_G and the number of bootstrap iterations B get large. 


```{r, fig.width=10, fig.height=3, echo = FALSE, warning = FALSE, message = FALSE}
library(ggplot2)
df <- data.table::fread("C:/Users/alexa/Dropbox/R package development/wildboottestjlr/benchmarks_df")
df$B <- factor(df$B, levels = c("10K", "100K"))
df$N_G <- factor(df$N_G, levels = c("N_G = 20", "N_G = 50", "N_G = 100", "N_G = 500", "N_G = 1000"))

ggplot(data = df, aes(x = B, y = time, color = type)) + 
  facet_wrap(~N_G, nrow = 1) + 
  geom_point() + 
scale_y_continuous(trans='log10') + 
  theme_bw() + 
  xlab("Bootstrap iterations") + 
  ylab("time in seconds, log scale")
```

The benchmarks plot the median value of 3 runs of a linear regression with N = 10.000 and k = 21.