-
Notifications
You must be signed in to change notification settings - Fork 1
/
Day_2_Prac_2_regression_mcmc.R
298 lines (203 loc) · 12.8 KB
/
Day_2_Prac_2_regression_mcmc.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
```{r}
#options(repr.plot.width=4, repr.plot.height=4)
```
## Opening the MCMC blackbox: a Bayesian linear model
- In this example, we will simulate some data according to a linear model. The data will follow the regression line
y = mx + b + e
Where *m* is the slope, *b* is the intercept, and *e* is the error term. In particular, we expect the points to be normally distributed along this line, so we will model the error term as a normal distribution with an unknown standard deviation.
Our simulation will take the form:
y = 3*x + 15 + N(0, 7)
The slope is 3, the *y* intercept is 15, and the error is described by a normal distribution with mean of 0 and standard deviation of 7.
We will first generate the data and plot it:
```{r , dev='pdf'}
set.seed(121432)
x <- runif(100, 0, 100)
y <- 3*x + 15 + rnorm(length(x), sd = 7)
plot(x, y)
```
Next we need to define some functions for the MCMC. If you are well acquainted with R, you can follow the code and ask any questions.
**The Likelihood**
- Notice that the likelihood is taking the a set of given values for the slope, intercept, and error to to predict y. It then calculates the difference between the predicted value of y and the actual value of y and calculates their density according to a normal distribution with the given standard deviation:
```{r , dev='pdf'}
likelihood <- function(par){
y_predicted <- par[1] * x + par[2]
single_likelihoods <- dnorm(y, mean = y_predicted, sd = par[3], log = T)
return(sum(single_likelihoods))
}
```
**The prior**
- For the prior, we simply calculate the density of the three parameters in our model given any distribution of our choice. Our prior will initially consist or normal distributions with means of 3, 15, and 10 for the slope, intercept, and error term, respectively. All with a standard deviation of 10. In a later stage we will modify the prior. Note that we do not use the data to calculate the prior.
```{r}
prior <- function(par){
prior_slope <- dnorm(par[1], mean = 3, sd = 10, log = T)
prior_intercept <- dnorm(par[2], mean = 15, sd = 10, log = T)
prior_error <- dnorm(par[3], mean = 10, sd = 10, log = T)
return(prior_slope+prior_intercept+prior_error)
}
```
**The proposal function**
- This is a function that will modify our parameter values at each MCMC step. It dictates how well the chain will move through parameter space. Concretely, we will add random numbers from a standard normal distribution for each parameter, with the condition that the error term cannot be negative because it corresponds to a standard devation, which always has positive values.
```{r}
proposal_function <- function(par){
slope_proposal <- par[1] + rnorm(1, mean = 0, sd = 1)
intercept_proposal <- par[2] + rnorm(1, mean = 0, sd = 1)
error_proposal <- abs(par[3] + rnorm(1, mean = 0, sd = 1))
return(c(slope_proposal, intercept_proposal, error_proposal))
}
```
Next we will define a function to run the MCMC. We have not defined the posterior, but we will do so internally by multiplying the prior and the posterior. The MCMC will require two inputs, a set of parameter starting values, and the number of iterations. We will modify these through the prac to understand how the algorithm works.
```{r}
start_values <- c(1, 1, 1)
iterations <- 100000
run_mcmc <- function(start_values, iterations, sample_from_prior = F){
chain <- matrix(NA, iterations, 6)
prior_start <- prior(start_values)
likelihood_start <- likelihood(start_values)
if(sample_from_prior) likelihood_start <- 0
posterior_start <- prior_start+likelihood_start
chain[1, ] <- c(prior_start, likelihood_start, posterior_start, start_values)
colnames(chain) <- c('prior', 'likelihood', 'posterior', 'slope', 'intercept','error')
for(i in 2:iterations){
proposal <- proposal_function(chain[i-1, 4:6])
prior_temp <- prior(proposal)
likelihood_temp <- likelihood(proposal)
if(sample_from_prior) likelihood_temp <- 0
posterior_temp <- prior_temp + likelihood_temp
mh_ratio <- exp(posterior_temp - chain[i-1, 'posterior'])
if(mh_ratio > 1){ # Accept if the new proposal has a higher posterior than the previous step
chain[i, ] <- c(prior_temp, likelihood_temp, posterior_temp, proposal)
}else if(mh_ratio > runif(1)){ # Accept if the new proposal has a higher posterior than a random number from a uniform distribuiton
chain[i, ] <- c(prior_temp, likelihood_temp, posterior_temp, proposal)
}else{ # Reject proposal and return to previous step.
chain[i, ] <- chain[i-1, ]
}
}
return(chain)
}
```
Assessing sufficient sampling of an MCMC is very difficult, and it is often helpful to visualise the run. Below we will define a function to plot the output of the MCMC. It requires the user to specify an MCMC chain, a burnin, and whether it should use colours for plotting.
```{r}
plot_mcmc <- function(chain, burnin, use_heat_colors = T, col_line = rgb(0, 0, 0, 0.5)){
ml_fit <- summary(lm(y ~ x))
ml_estimates <- ml_fit$coefficients[, 'Estimate']
samples <- (1:nrow(chain))[-c(1:burnin)]
par(mfrow = c(3, 3))
plot(chain[-c(1:burnin), 'slope'], type = 'l', xlab = 'step', ylab = 'Slope')
lines( samples, rep(ml_estimates[2], length(samples)), col = 'red', lwd = 2)
plot(chain[-c(1:burnin), 'intercept'], type = 'l', xlab = 'step', ylab = 'Intercept')
lines( samples, rep(ml_estimates[1], length(samples)), col = 'red', lwd = 2)
plot(chain[-c(1:burnin), 'error'], type = 'l', xlab = 'step', ylab = 'Error')
hist(chain[-c(1:burnin), 'slope'], xlab = 'Slope', main = '')
lines(x = rep(ml_estimates[2], 2), y = c(0, 10000), col = 'red', lwd = 3)
hist(chain[-c(1:burnin), 'intercept'], xlab = 'Intercept', main = '')
lines(x = rep(ml_estimates[1], 2), y = c(0, 10000), col = 'red', lwd = 3)
hist(chain[-c(1:burnin), 'error'], xlab = 'Error', main = '')
cols <- rev( heat.colors(nrow(chain), alpha = 1.0))
if(use_heat_colors){
plot(chain[order(chain[,'posterior'],decreasing =F),
c('slope', 'intercept')],pch = 20, type = 'b', col = cols)#0, 0, 0.2))
}else{
plot(chain[order(chain[,'posterior'],decreasing =F),
c('slope', 'intercept')],pch = 20, type = 'l', col = rgb(0, 0, 0, 0.5))
}
plot(x, y)
for(i in 1:length(samples)){
abline(b = mean( chain[samples[i], 'slope']),
a = mean( chain[samples[i], 'intercept']),col = rgb(1, 0, 0, 0.2), lwd = 0.5)
}
}
```
**Sufficient sampling**
- In our first example, we will run the chain with 100 iterations. The starting values for the parameters will be 1, 1, 1, for the slope, intercept, and error. Because this is a stochastic algorithm, the starting values should not matter. Feel free to try differnet values to see this. We will then plot the results from the chain, with a burnin of 10 samples.
```{r}
#options(repr.plot.width=7,repr.plot.height=7)
```
```{r , dev='pdf'}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 100, sample_from_prior = F)
plot_mcmc(chain, 10, use_heat_colors = F)
```
The first row of plots are the traces for each parameter, the second row is a histogram of the post-burnin samples, which correspond the the posterior distribution for each parameter. The red lines correspond to the maximum-likelihood esimates, for comparison.
The third row shows the trace of the intercept vs. the slope. If you used use_heat_colors =T, the brighter colours corresponding to regions with higher posterior probability. The last plot is our original data, and regression lines sampled from the posterior, which is the posterior distribution of the regression.
** Question 5.1**
Judging by the traces, do you think that we have a reliable sample from the posterior?
- We will now run the chain with 50,000 iterations. We will also increase the burnin accordingly.
```{r , dev='pdf'}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 50000, sample_from_prior = F)
plot_mcmc(chain, 5000, use_heat_colors = F)
```
**Question 5.2** Has increasing the number of iterations improved our confidence in our estimates? How do we know this?
**The proposal function**
- We will now tweak the proposal function to see its effect on the MCMC. For example, we will allow it to make very large moves by increasing the standard deviation from 1 to 10. To do this, we need to redefine the proposal function:
```{r , dev='pdf'}
proposal_function <- function(par){
slope_proposal <- par[1] + rnorm(1, mean = 0, sd = 10)
intercept_proposal <- par[2] + rnorm(1, mean = 0, sd = 10)
error_proposal <- abs(par[3] + rnorm(1, mean = 0, sd = 10))
return(c(slope_proposal, intercept_proposal, error_proposal))
}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 50000, sample_from_prior = F)
plot_mcmc(chain, 5000, use_heat_colors = F)
```
**Question 5.3** Why has increasing the size of the moves in the proposal function made the MCMC less efficient?
It should be clear that our previous proposal function was better. We will redefine it for the rest of the prac as we did before:
```{r}
proposal_function <- function(par){
slope_proposal <- par[1] + rnorm(1, mean = 0, sd = 1)
intercept_proposal <- par[2] + rnorm(1, mean = 0, sd = 1)
error_proposal <- abs(par[3] + rnorm(1, mean = 0, sd = 1))
return(c(slope_proposal, intercept_proposal, error_proposal))
}
```
**Prior sensitivity: a misleading prior**
- An other factor that is worth inspecting is the influence of the priors. We will set some clearly misleading and very informative priors for the slope and the intercept (we can also do this for the error term, but it is more tractable to do this for a few parameters at a time) and re run our MCMC.
- slope: N(10, 0.01)
- intercept: N(1, 0.01)
```{r}
prior <- function(par){
prior_slope <- dnorm(par[1], mean = 10, sd = 0.01, log = T)
prior_intercept <- dnorm(par[2], mean = 1, sd = 0.01, log = T)
prior_error <- dnorm(par[3], mean = 10, sd = 10, log = T)
return(prior_slope+prior_intercept+prior_error)
}
```
```{r , dev='pdf'}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 50000, sample_from_prior = F)
plot_mcmc(chain, 5000, use_heat_colors = F, col_line = 0.1)
```
**Prior sensitivity: uninformative priors**
The MCMC has struggled to obtain sufficient samples from the posterior. The error term appears to increase indefinitely. Although we could possibly address this by using more iterations, note that the regression line deviates substantially from what we obtained using more reasonable priors. Our priors here are overly influential and might mislead our inferences.
- It is often the case, that the researcher has no grounds for choosing one prior over an other. In such cases, it might be appropriate to choose an 'uninformative' prior. Redefine the priors using uniform distributions with -100 and 100 and minimum and maximum bounds respectively.
```{r}
prior <- function(par){
prior_slope <- dunif(par[1], min = -100, max = 100,log = T)
prior_intercept <- dunif(par[2], min = -100, max = 100, log = T)
prior_error <- dunif(par[3], min = -100, max = 100, log = T)
return(prior_slope+prior_intercept+prior_error)
}
```
```{r , dev='pdf'}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 100000, sample_from_prior = F)
plot_mcmc(chain, 5000, use_heat_colors = F, col_line = 0.1)
```
**Prior sensitivity: sampling from the uniform prior**
- It appears that our data are sufficiently informative to recover the correct estimates, even with uninformative priors. In practice, however, it is often difficult to determine whether the prior is overly influential, and the extent to which the data are driving the estimates. An easy way to do this is to sample from the prior. This consists in running the MCMC with no data, or by fixing the likelihood to an arbitrary value. To do this, we will set the option sample_from_prior to TRUE. This simply assigns a likelihood of 0, regardless of the parameter values. We will also use the prior from our second set of analyses. This analysis usually runs much faster becuase it involves fewer calculations:
```{r}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 150000, sample_from_prior = T)
```
```{r , dev='pdf'}
plot_mcmc(chain, 5000, use_heat_colors = F, col_line = 0.1)
```
**Prior sensitivity: sampling from the normal prior**
- For comparisson, we will also sample from the prior that we used initially.
```{r}
prior <- function(par){
prior_slope <- dnorm(par[1], mean = 3, sd = 10, log = T)
prior_intercept <- dnorm(par[2], mean = 15, sd = 10, log = T)
prior_error <- dnorm(par[3], mean = 10, sd = 10, log = T)
return(prior_slope+prior_intercept+prior_error)
}
```
```{r , dev='pdf'}
chain <- run_mcmc(start_values = c(1, 1, 1), iterations = 150000, sample_from_prior = T)
plot_mcmc(chain, 5000, use_heat_colors = F, col_line = 0.1)
```