-
Notifications
You must be signed in to change notification settings - Fork 65
/
Copy pathREADME.Rmd
113 lines (84 loc) · 3.21 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
set.seed(1014)
```
# modelr <img src="man/figures/logo.png" align="right" />
<!-- badges: start -->
[![Lifecycle: superseded](https://img.shields.io/badge/lifecycle-superseded-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#superseded)
[![R-CMD-check](https://github.com/tidyverse/modelr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/modelr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/modelr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/modelr?branch=main)
<!-- badges: end -->
## Overview
The modelr package provides functions that help you create elegant pipelines when modelling.
It was designed primarily to support teaching the basics of modelling for the 1st edition of [R for Data Science](https://r4ds.had.co.nz/model-basics.html).
We no longer recommend it and instead suggest <https://www.tidymodels.org/> for a more comprehensive framework for modelling within the tidyverse.
## Installation
```{r, eval = FALSE}
# The easiest way to get modelr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just modelr:
install.packages("modelr")
```
## Getting started
```{r}
library(modelr)
```
### Partitioning and sampling
The `resample` class stores a "reference" to the original dataset and a vector of row indices. A resample can be turned into a dataframe by calling `as.data.frame()`. The indices can be extracted using `as.integer()`:
```{r}
# a subsample of the first ten rows in the data frame
rs <- resample(mtcars, 1:10)
as.data.frame(rs)
as.integer(rs)
```
The class can be utilized in generating an exclusive partitioning of a data frame:
```{r}
# generate a 30% testing partition and a 70% training partition
ex <- resample_partition(mtcars, c(test = 0.3, train = 0.7))
lapply(ex, dim)
```
modelr offers several resampling methods that result in a list of `resample` objects (organized in a data frame):
```{r}
# bootstrap
boot <- bootstrap(mtcars, 100)
# k-fold cross-validation
cv1 <- crossv_kfold(mtcars, 5)
# Monte Carlo cross-validation
cv2 <- crossv_mc(mtcars, 100)
dim(boot$strap[[1]])
dim(cv1$train[[1]])
dim(cv1$test[[1]])
dim(cv2$train[[1]])
dim(cv2$test[[1]])
```
### Model quality metrics
modelr includes several often-used model quality metrics:
```{r}
mod <- lm(mpg ~ wt, data = mtcars)
rmse(mod, mtcars)
rsquare(mod, mtcars)
mae(mod, mtcars)
qae(mod, mtcars)
```
### Interacting with models
A set of functions let you seamlessly add predictions and residuals as additional columns to an existing data frame:
```{r}
set.seed(1014)
df <- tibble::tibble(
x = sort(runif(100)),
y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x))
)
mod <- lm(y ~ x, data = df)
df %>% add_predictions(mod)
df %>% add_residuals(mod)
```
For visualization purposes it is often useful to use an evenly spaced grid of points from the data:
```{r}
data_grid(mtcars, wt = seq_range(wt, 10), cyl, vs)
# For continuous variables, seq_range is useful
mtcars_mod <- lm(mpg ~ wt + cyl + vs, data = mtcars)
data_grid(mtcars, wt = seq_range(wt, 10), cyl, vs) %>% add_predictions(mtcars_mod)
```