README.Rmd

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
set.seed(1014)
```

# modelr <img src="man/figures/logo.png" align="right" />

<!-- badges: start -->
[![Lifecycle: superseded](https://img.shields.io/badge/lifecycle-superseded-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#superseded)
[![R-CMD-check](https://github.com/tidyverse/modelr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/modelr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/modelr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/modelr?branch=main)
<!-- badges: end -->

## Overview

The modelr package provides functions that help you create elegant pipelines when modelling. 
It was designed primarily to support teaching the basics of modelling for the 1st edition of [R for Data Science](https://r4ds.had.co.nz/model-basics.html). 

We no longer recommend it and instead suggest <https://www.tidymodels.org/> for a more comprehensive framework for modelling within the tidyverse.

## Installation

```{r, eval = FALSE}
# The easiest way to get modelr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just modelr:
install.packages("modelr")
```

## Getting started

```{r}
library(modelr)
```

### Partitioning and sampling

The `resample` class stores a "reference" to the original dataset and a vector of row indices. A resample can be turned into a dataframe by calling `as.data.frame()`. The indices can be extracted using `as.integer()`:

```{r}
# a subsample of the first ten rows in the data frame
rs <- resample(mtcars, 1:10)
as.data.frame(rs)
as.integer(rs)
```

The class can be utilized in generating an exclusive partitioning of a data frame:

```{r}
# generate a 30% testing partition and a 70% training partition
ex <- resample_partition(mtcars, c(test = 0.3, train = 0.7))
lapply(ex, dim)
```

modelr offers several resampling methods that result in a list of `resample` objects (organized in a data frame):

```{r}
# bootstrap
boot <- bootstrap(mtcars, 100)
# k-fold cross-validation
cv1 <- crossv_kfold(mtcars, 5)
# Monte Carlo cross-validation
cv2 <- crossv_mc(mtcars, 100)

dim(boot$strap[[1]])
dim(cv1$train[[1]])
dim(cv1$test[[1]])
dim(cv2$train[[1]])
dim(cv2$test[[1]])
```

### Model quality metrics

modelr includes several often-used model quality metrics:

```{r}
mod <- lm(mpg ~ wt, data = mtcars)
rmse(mod, mtcars)
rsquare(mod, mtcars)
mae(mod, mtcars)
qae(mod, mtcars)
```

### Interacting with models

A set of functions let you seamlessly add predictions and residuals as additional columns to an existing data frame:

```{r}
set.seed(1014)
df <- tibble::tibble(
  x = sort(runif(100)),
  y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x))
)

mod <- lm(y ~ x, data = df)
df %>% add_predictions(mod)
df %>% add_residuals(mod)
```

For visualization purposes it is often useful to use an evenly spaced grid of points from the data:

```{r}
data_grid(mtcars, wt = seq_range(wt, 10), cyl, vs)

# For continuous variables, seq_range is useful
mtcars_mod <- lm(mpg ~ wt + cyl + vs, data = mtcars)
data_grid(mtcars, wt = seq_range(wt, 10), cyl, vs) %>% add_predictions(mtcars_mod)
```