forked from koalaverse/homlr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
executable file
·128 lines (85 loc) · 6.38 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: "Hands-on Machine Learning with R"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
output: bookdown::gitbook
documentclass: book
bibliography: [book.bib, packages.bib]
biblio-style: apalike
link-citations: yes
github-repo: bradleyboehmke/hands-on-machine-learning-with-r
description: "A Machine Learning Algorithmic Deep Dive Using R."
---
# Preface {-}
<img src="images/mlr.png" style="float:right; margin: 0px 0px 0px 0px; width: 40%; height: 40%;" />
Welcome to Hands-on Machine Learning with R. This book provides hands-on modules for many of the most common machine learning methods to include:
- Generalized low rank models
- Clustering algorithms
- Autoencoders
- Regularized models
- Random forests
- Gradient boosting machines
- Deep neural networks
- Stacking / super learner
- and more!
You will learn how to build and tune these various models with R packages that have been tested and approved due to their ability to scale well. However, my motivation in almost every case is to describe the techniques in a way that helps develop intuition for its strengths and weaknesses. For the most part, I minimize mathematical complexity when possible but also provide resources to get deeper into the details if desired.
## Who should read this {-}
I intend this work to be a practitioner's guide to the machine learning process and a place where one can come to learn about the approach and to gain intuition about the many commonly used, modern, and powerful methods accepted in the machine learning community. If you are familiar with the analytic methodologies, this book may still serve as a reference for how to work with the various R packages for implementation. While an abundance of videos, blog posts, and tutorials exist online, I've long been frustrated by the lack of consistency, completeness, and bias towards singular packages for implementation. This is what inspired this book.
This book is not meant to be an introduction to R or to programming in general; as I assume the reader has familiarity with the R language to include defining functions, managing R objects, controlling the flow of a program, and other basic tasks. If not, I would refer you to [R for Data Science](http://r4ds.had.co.nz/index.html) [@wickham2016r] to learn the fundamentals of data science with R such as importing, cleaning, transforming, visualizing, and exploring your data. For those looking to advance their R programming skills and knowledge of the languge, I would refer you to [Advanced R](http://adv-r.had.co.nz/) [@wickham2014advanced]. Nor is this book designed to be a deep dive into the theory and math underpinning machine learning algorithms. Several books already exist that do great justice in this arena (i.e. [Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) [@esl], [Computer Age Statistical Inference](https://web.stanford.edu/~hastie/CASI/) [@apm], [Deep Learning](http://www.deeplearningbook.org/) [@goodfellow2016deep]).
Instead, this book is meant to help R users learn to use the machine learning stack within R, which includes using various R packages such as `glmnet`, `h20`, `ranger`, `xgboost`, `lime`, and others to effectively model and gain insight from your data. The book favors a hands-on approach, growing an intuitive understanding of machine learning through concrete examples and just a little bit of theory. While you can read this book without opening R, I highly recommend you experiment with the code examples provided throughout.
## Why R {-}
R has emerged over the last couple decades as a first-class tool for scientific computing tasks, and has been a consistent leader in implementing statistical methodologies for analyzing data. The usefulness of R for data science stems from the large, active, and growing ecosystem of third-party packages: `tidyverse` for common data analysis activities; `h2o`, `ranger`, `xgboost`, and others for fast and scalable machine learning; `lime`, `pdp`, `DALEX`, and others for machine learning interpretability; and many more tools will be mentioned throughout the pages that follow.
## Structure of the book {-}
Each chapter of this book focuses on a particular part of the machine learning process along with various packages to perform that process.
TBD...
## Conventions used in this book {-}
The following typographical conventions are used in this book:
* ___strong italic___: indicates new terms,
* __bold__: indicates package & file names,
* `inline code`: monospaced highlighted text indicates functions or other commands that could be typed literally by the user,
* code chunk: indicates commands or other text that could be typed literally by the user
```{r, first-code-chunk, collapse=TRUE}
1 + 2
```
In addition to the general text used throughout, you will notice the following code chunks with images, which signify:
```{block, type = "rmdtip"}
Signifies a tip or suggestion
```
```{block, type = "rmdnote"}
Signifies a general note
```
```{block, type = "rmdwarning"}
Signifies a warning or caution
```
## Additional resources {-}
There are many great resources available to learn about machine learning. At the end of each chapter I provide a *Learn More* section that lists resources that I have found extremely useful for digging deeper into the methodology and applying with code.
## Feedback {-}
Reader comments are greatly appreciated. To report errors or bugs please post an issue at https://github.com/bradleyboehmke/hands-on-machine-learning-with-r/issues.
## Acknowledgments {-}
TBD
## Software information {-}
An online version of this book is available at https://bradleyboehmke.github.io/hands-on-machine-learning-with-r/. The source of the book is available at https://github.com/bradleyboehmke/hands-on-machine-learning-with-r. The book is powered by https://bookdown.org which makes it easy to turn R markdown files into HTML, PDF, and EPUB.
This book was built with the following packages and R version. All code was executed on 2013 MacBook Pro with a 2.4 GHz Intel Core i5 processor, 8 GB of memory, 1600MHz speed, and double data rate synchronous dynamic random access memory (DDR3).
```{r, collapse=TRUE, comment = "#>"}
# packages used
pkgs <- c(
"AmesHousing",
"caret",
"data.table",
"dplyr",
"ggplot2",
"gbm",
"glmnet",
"h2o",
"pdp",
"pROC",
"purrr",
"ranger",
"ROCR",
"rsample",
"vip",
"xgboost"
)
# package & session info
devtools::session_info(pkgs)
```