forked from pbreheny/grpreg
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
91 lines (64 loc) · 4.23 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
```{r setup, include=FALSE}
library(grpreg)
set.seed(4)
knitr::opts_knit$set(aliases=c(h = 'fig.height', w = 'fig.width'))
knitr::opts_chunk$set(comment="#", collapse=TRUE, cache=FALSE, tidy=FALSE)
knitr::knit_hooks$set(small.mar = function(before, options, envir) {
if (before) par(mar = c(4, 4, .1, .1))
})
```
# Regularization Paths for Regression Models with Grouped Covariates
`grpreg` is an R package for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bi-level selection methods such as the group exponential lasso, the composite MCP, and the group bridge. Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, and prediction are also provided.
This site focuses on illustrating the usage and syntax of `grpreg`. For more on the algorithms used by `grpreg`, see the original articles:
* [Breheny, P. and Huang, J. (2009) Penalized methods for bi-level variable selection. *Statistics and its interface*, **2**: 369-380.](http://myweb.uiowa.edu/pbreheny/pdf/Breheny2009.pdf)
* [Breheny, P. and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. *Statistics and Computing*, **25**: 173-187.](http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s11222-013-9424-2)
For more information on specific penalties, including references describing the methods implemented by `grpreg` in greater detail, see <http://pbreheny.github.io/grpreg/articles/web/penalties.html>.
## Installation
`grpreg` is on CRAN, so it can be installed via:
```{r eval=FALSE}
install.packages("grpreg")
```
## Brief introduction
```{r read_data, echo=FALSE}
data(Birthwt)
X <- Birthwt$X
y <- Birthwt$bwt
group <- Birthwt$group
```
In regression modeling, variables are often grouped. For example, suppose we took the design matrix from the `MASS::birthwt` data set and expanded its original variables by including multiple indicator functions for categorical variables such as race and basis expansions using polynomials or splines for the continuous variables:
```{r data_struc}
head(X)
group
```
We can fit a penalized regression model to this data with:
```{r fit}
fit <- grpreg(X, y, group)
```
By default, `grpreg` fits a linear regression model with a group lasso penalty. For more detail on other types of models available, see [here](articles/web/models.html). For more detail on other types of penalties available, see [here](articles/web/penalties.html).
Fitting a penalized regression model produces a path of coefficients, which we can plot with
```{r plot, h=4, w=6, small.mar=TRUE}
plot(fit)
```
Notice that when a group enters the model (e.g., the green group), all of its coefficients become nonzero; this is what happens with group lasso models. To see what the coefficients are, we could use the `coef` function:
```{r coef}
coef(fit, lambda=0.05)
```
Note that the number of physician's visits (`ftv`) is not included in the model at $\lambda=0.05$.
Typically, one would carry out cross-validation for the purposes of carrying out inference on the predictive accuracy of the model at various values of $\lambda$.
```{r cvplot, h=5, w=6}
cvfit <- cv.grpreg(X, y, group, penalty="grLasso")
plot(cvfit)
```
The coefficients corresponding to the value of $\lambda$ that minimizes the cross-validation error can be obtained via `coef`:
```{r cv_coef}
coef(cvfit)
```
Predicted values can be obtained via `predict`, which has a number of options:
```{r predict}
predict(cvfit, X=head(X)) # Predictions for new observations
predict(fit, type="ngroups", lambda=0.1) # Number of nonzero groups
predict(fit, type="groups", lambda=0.1) # Identity of nonzero groups
predict(fit, type="nvars", lambda=0.1) # Number of nonzero coefficients
predict(fit, type="vars", lambda=0.1) # Identity of nonzero coefficients
```
Note that the original fit (to the full data set) is returned as `cvfit$fit`; it is not necessary to call both `grpreg` and `cv.grpreg` to analyze a data set. Several other penalties are available, as are methods for logistic regression and Cox proportional hazards regression.