-
Notifications
You must be signed in to change notification settings - Fork 8
/
R-Viz-Skeleton.R
205 lines (113 loc) · 4.83 KB
/
R-Viz-Skeleton.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# Load library -----
library(tidyverse)
# or load readr, dplyr, ggplot2
# Download data and skeleton script from GitHub into project -----
# Go to GitHub [repo](https://github.com/mariekekjones/BIMS-bootcamp) to get these materials
# NHANES data ----
# Read in downloaded data using `read_csv()` from {readr} package
nh <- read_csv("nhanes.csv")
# Show the first few lines of the data and the dimensions
head(nh)
head(nh, 10)
dim(nh)
# Optionally bring up data in a viewer window.
View(nh)
# dplyr review --------
# Access dplyr cheat sheet from `Help` menu
# dplyr verbs
# 1. `filter()`
# 2.. `group_by()`
# 3. `summarize()`
# dplyr takes a tibble dataframe as its first argument and then a logical condition to meet as the second argument.
# - `==`: Equal to
# - `!=`: Not equal to
# - `>`, `>=`: Greater than, greater than or equal to
# - `<`, `<=`: Less than, less than or equal to
# If you want to satisfy *all* of multiple conditions, you can use the "and" operator, `&`.
# The "or" operator `|` (the pipe character, usually shift-backslash) will return a subset that meet *any* of the conditions.
# Let's use filter to return rows where the person was elderly (defined as >= 80 years old)
# Using the pipe -----
# `%>%` or `Control + Shift + M`
# `head()` without pipe
# `head()` with pipe
# Now let's use the pipe operator with filter to subset for elderly people >= 80 years old
# without pipe
# with pipe
# Nesting v. %>% ----------
# Let's say we want to see the mean height, grouped by Race, only for adults.
# without pipe
# with pipe
# with pipe arranged in order
# ** EXERCISE 1 ** ------------
# ** YOUR TURN **
# A. How many observations are there of children (< 18 years old)?
# B. How many cases of obese children are there (BMI >= 30)?
# C. Use `filter()`, `group_by()` and `summarize()` to find the mean BMI by Smoking Status for only Adults who have Diabetes. Do diabetic smokers or non-smokers have higher BMI?
# ggplot2 ---------
# allows you to build a plot layer-by-layer by specifying:
# - a **geom**, which specifies how the data are represented on the plot (points, lines, bars, etc.),
# - **aesthetics** that map variables in the data to axes on the plot or to plotting size, shape, color, etc.,
# - **facets**, which we've already seen above, that allow the data to be divided into chunks on the basis of other categorical or continuous variables and the same plot drawn for each chunk.
# Scatterplots ---------
# Age (X) against Height (Y) (continuous X, continuous Y)
# color the points by Gender
# color the points blue and shape them as +
# what is the difference between coloring by a variable and coloring by static values?
# plot points colored by Gender and smoothed line
# plot smoothed trend line and points both colored by gender
# color all layers by Gender
# color all layers by Gender, add transparency to points, make line bolder
# ** EXERCISE 2 ** ----------
# ** YOUR TURN **
# A. Use a scatterplot to investigate the relationship between Age and Testosterone.
# B. Color the plot in A by Gender.
# C. Create the plot in A for just men.
# D. Filter for men > 65 and < 80 years old and then examine the relationship between Age and Testosterone.
# E. Does the relationship you saw in D differ if the man is physically active (PhysActive == "Yes")? Use colored loess lines to see the effect of physical activity.
# Visualizations for discrete X ---------
# Plot BMI by Smoking Status
# blank canvas, note categories on X
# try with geom point
# no variability in X values --> overplotting
# so add some random variability to x-axis using geom_jitter
# remove NA category and add transparency
# plot boxplot
# plot jitter and boxplot
# improve the jitter/boxplot
# color boxplot by gender
# fill boxplot by gender
# **EXERCISE 3** ----------
# ** YOUR TURN **
# A. Create boxplots showing height for Adults of different Races
# B. Add jittered data under boxplots in A.
# C. Fill boxplots in A by Gender
# Plotting univariate continuous data -------
#canvas for height
# save canvas as p then add histogram
# change bin size
# smoothed density curve
# histogram colored by Race
# histogram filled by Race
# get help on histogram function
# change position
# add transparency
# try with density curves colored by Race
# change fill color and add transparency
# Faceting -----------
# overlapping histograms filled by Race
# facet histograms by race
# facet density plots by Race
# Choosing colors and themes ----------
# boxplot of BMI by Smoking status without missing Smoking Status
# filled by Diabetes
# see all color options
#change colors manually
#change theme
# ** EXERCISE 4 ** ----------
# ** YOUR TURN **
# Practice creating the plots pictured
# A.
# No = not physically active, Yes = physically active
# B.
# Custom colors are "salmon" and "seagreen"
# C.