-
Notifications
You must be signed in to change notification settings - Fork 5
/
Seminar_10_Communicating.Rmd
354 lines (246 loc) · 15.3 KB
/
Seminar_10_Communicating.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
## Seminar 10: Communicating Decision Support {#communicating}
Welcome to seminar 10 of **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to [Dr. Cory Whitney](mailto:[email protected]?subject=[Lecture_6]%20Decision%20Analysis%20Lecture) or the course tutor.
### Communicating results of Decision Analysis models
The results of a Monte Carlo model are not always easy to interpret and to communicate about. One important step in communicating these often very large data sets is with good visualization. In previous lectures and seminars we have covered the basics of plotting the distributions of model results for comparisons between decision options. In this lecture we will build on what we learned in the [Decision Models](#decision_models) lecture and the [Model programming](#model_programming) seminar to learn more about how to generate useful plots for communicating model results.
In the [Model programming part 2](#model_programming_cont) seminar we generated a model called `example_mc_simulation`. As a quick refresher change the plot of the results of that model from `hist_simple_overlay` to `boxplot_density` by using the `method` argument in the `plot_distributions` function.
In this seminar we will build on what we learned in the [Decision Models](#decision_models) lecture and the [Model programming part 1](#model_programming) seminar to learn more about how to generate useful forecasts for communicating model results. We will implement another Monte Carlo simulation.
In the [Model programming part 2](#model_programming_cont) seminar we generated a model called `example_mc_simulation`. As a quick refresher change the plot of the results of that model from `boxplot_density` to `smooth_simple_overlay` by using the `method` argument in the `plot_distributions` function.
```{r forecasts_plot_distribution, exercise=TRUE}
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "boxplot_density",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```
```{r forecasts_plot_distribution-solution}
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "smooth_simple_overlay",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```
### Making forecasts
First let's look at a case study on agroforestry interventions in Northwest Vietnam [@do_decision_2020].
<iframe width="560" height="315" src="https://www.youtube.com/embed/FZfACcoX4O0" frameborder="0" allowfullscreen></iframe>
```{r do-forecasts-question-1, echo=FALSE}
question("How do probabilistic models differ from frequentist or deterministic models?",
answer("They contain more variables."),
answer("They do not have as broad a range of applicability across different domains of science."),
answer("Unlike deterministic models, probabilistic models can accept uncertainty as inputs.", correct = TRUE),
answer("Unlike deterministic models, probabilistic models can produce uncertain distributions as outputs.", correct = TRUE),
incorrect = "Watch [the talk](https://youtu.be/FZfACcoX4O0) and try again.",
allow_retry = TRUE
)
```
```{r do-forecasts-question-2, echo=FALSE}
question("What are the steps in the proposed process for rational decision-making?",
answer("Gather empirical evidence and data and apply regression tools as part of an exploratory process to look for relationships. Then decide based on any relevant results."),
answer("Perform interviews and then decide based on the suggestion of experts."),
answer("Pool information, quantify uncertainty, produce assessments, then either measure where information value is high or take decision.", correct = TRUE),
incorrect = "Watch [the talk](https://youtu.be/FZfACcoX4O0) and try again.",
allow_retry = TRUE
)
```
```{r do-forecasts-question-3, echo=FALSE}
question("Why was Projection to Latent Structure (PLS) analysis used in the study?",
answer("To generate a p-value for the reliability of the model results."),
answer("To perform sensitivity analysis.", correct = TRUE),
answer("To assess the sensitivity of the output with the many collinear model inputs using the Varible Importance in the Projection (VIP) score.", correct = TRUE),
incorrect = "Watch [the talk](https://youtu.be/FZfACcoX4O0) and try again.",
allow_retry = TRUE
)
```
### Rebuild the example_mc_simulation
Here we build the example_mc_simulation again for use in the following exercises.
```{r example_mc_simulation_seminar10, eval=FALSE}
# input_estimates <- read.csv("data/input_estimates.csv")
model_function <- function(){
income <- Yield * Market_price
overall_costs <- Labor_cost + Management_cost
final_result <- income - overall_costs
return(list(final_result = final_result))
}
example_mc_simulation <- decisionSupport::mcSimulation(estimate = decisionSupport::as.estimate(input_estimates),
model_function = model_function,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
```
```{r communicating-plot_distribution, exercise=TRUE}
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "hist_simple_overlay",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```
```{r communicating-plot_distribution-solution}
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "boxplot_density",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```
### Plot cashflow
The `plot_cashflow` function from the `decisionSupport` package [@R-decisionSupport] creates a cashflow plot of the returned list of related outputs from the `mcSimulation` function using ggplot2 [@R-ggplot2]. The function automatically defines quantiles (5 to 95% and 25 to 75%) as well as a value for the median.
```{r communicating-plot_cashflow, exercise=TRUE}
# Plotting the cashflow:
# Create the estimate object (for multiple options):
variable = c("revenue_option1", "costs_option1", "n_years",
"revenue_option2", "costs_option2")
distribution = c("norm", "norm", "const", "norm", "norm")
lower = c(10000, 5000, 20, 8000, 500)
upper = c(100000, 50000, 20, 80000, 30000)
costBenefitEstimate <- as.estimate(variable, distribution, lower, upper)
# Define the model function without name for the return value:
profit1 <- function(x) {
cashflow_option1 <- vv(revenue_option1 - costs_option1, n = n_years, var_CV = 100)
cashflow_option2 <- vv(revenue_option2 - costs_option2, n = n_years, var_CV = 100)
return(list(Revenues_option1 = revenue_option1,
Revenues_option2 = revenue_option2,
Costs_option1 = costs_option1,
Costs_option2 = costs_option2,
Cashflow_option_one = cashflow_option1,
Cashflow_option_two = cashflow_option2))
}
# Perform the Monte Carlo simulation:
predictionProfit1 <- mcSimulation(estimate = costBenefitEstimate,
model_function = profit1,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
# Plot the cashflow distribution over time
plot_cashflow(mcSimulation_object = predictionProfit1, cashflow_var_name = "Cashflow_option_one",
x_axis_name = "Years with intervention",
y_axis_name = "Annual cashflow in USD",
color_25_75 = "green4", color_5_95 = "green1",
color_median = "red")
```
Plot the cashflow with panels to compare the cashflow distribution over time for multiple decision options:
```{r communicating-plot_cashflow-panels, exercise=TRUE}
# Plotting the cashflow:
# Create the estimate object (for multiple options):
variable = c("revenue_option1", "costs_option1", "n_years",
"revenue_option2", "costs_option2")
distribution = c("norm", "norm", "const", "norm", "norm")
lower = c(10000, 5000, 10, 8000, 500)
upper = c(100000, 50000, 10, 80000, 30000)
costBenefitEstimate <- as.estimate(variable, distribution, lower, upper)
# Define the model function without name for the return value:
profit1 <- function(x) {
cashflow_option1 <- vv(revenue_option1 - costs_option1, n = n_years, var_CV = 100)
cashflow_option2 <- vv(revenue_option2 - costs_option2, n = n_years, var_CV = 100)
return(list(Revenues_option1 = revenue_option1,
Revenues_option2 = revenue_option2,
Costs_option1 = costs_option1,
Costs_option2 = costs_option2,
Cashflow_option_one = cashflow_option1,
Cashflow_option_two = cashflow_option2))
}
# Perform the Monte Carlo simulation:
predictionProfit1 <- mcSimulation(estimate = costBenefitEstimate,
model_function = profit1,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
plot_cashflow(mcSimulation_object = predictionProfit1,
cashflow_var_name = c("Cashflow_option_one", "Cashflow_option_two"),
x_axis_name = "Years with intervention",
y_axis_name = "Annual cashflow in USD",
color_25_75 = "green4", color_5_95 = "green1",
color_median = "red",
facet_labels = c("Option 1", "Option 2"))
```
### Plot many results together
We can use the `compound_figure` function to create a simple compound figure of model results and analyses of a binary decision (do or do not do). The figure includes the distribution of the expected outcome, the expected cashflow, as well as the variable importance and the value of information.
```{r communicating-compound_figure, exercise=TRUE}
# Create the estimate object:
cost_benefit_table <- data.frame(label = c("Revenue", "Costs"),
variable = c("revenue", "costs"),
distribution = c("norm", "norm"),
lower = c(100, 500),
median = c(NA, NA),
upper = c(10000, 5000))
# (a) Define the model function without name for the return value:
profit1 <- function() {
Decision <- revenue - costs
cashflow <- rnorm(rep(revenue, 20))
return(list(Revenues = revenue,
Costs = costs,
cashflow = cashflow,
Decision = Decision))
}
compound_figure(model = profit1,
input_table = cost_benefit_table,
decision_var_name = "Decision",
cashflow_var_name = "cashflow",
model_runs = 1e2,
distribution_method = 'smooth_simple_overlay')
```
### Other visualization options
Here we demonstrate a few more various graphical options to visualize uncertainty intervals of outcomes of Monte Carlo simulations. We create a data set of yield distributions of three different farming practices and use the function `mcmc_intervals()` from the `bayesplot` library to plot the data set [@R-bayesplot].
```{r communicating-bayes_plot-mcmc_intervals, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
color_scheme_set("red")
mcmc_intervals(test,prob = 0.5,prob_outer = 0.9,point_est = "median")
```
Do the same with the with `mcmc_areas()` function from the `bayesplot` library [@R-bayesplot].
```{r communicating-bayes_plot-mcmc_areas, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
color_scheme_set("blue")
mcmc_areas(test,prob = 0.9,point_est = "median")
```
### Comparative density curves
We can also use `geom_density()`in `ggplot2` to compare the spread of different distributions [@R-ggplot2]:
```{r communicating-geom_density, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
stacked_test <- stack(test)
ggplot(stacked_test,
aes(x=values,group=ind,fill=ind )) +
geom_density(colour=NA,alpha=.5) +
ylab("Probability density") +
xlab("Yield")
```
### Comparative histogram
Use `ggplot2` `geom_histogram()`function to show the histogram of the data in comparison:
```{r communicating-geom_histogram, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
stacked_test <- stack(test)
ggplot(stacked_test,aes(x=values))+
geom_histogram(data=subset(stacked_test,ind =='practice.1'),
aes(fill = ind), alpha = 0.5, bins = 150) +
geom_histogram(data=subset(stacked_test,ind == 'practice.2'),
aes(fill = ind), alpha = 0.5, bins = 150) +
geom_histogram(data=subset(stacked_test,ind == 'practice.3'),
aes(fill = ind), alpha = 0.5, bins = 150)
```
### Reading
<!-- ?NEXT YEAR ADD THIS HERE AND REMOVE FROM other CHAPTER?-->
<!-- This week we will read and discuss 'Decision Analysis of Agroforestry Options Reveals Adoption Risks for Resource-Poor Farmers' by @do_decision_2020. -->
<!-- Do, Hoa, Eike Luedeling, and Cory Whitney. “Decision Analysis of Agroforestry Options Reveals Adoption Risks for Resource-Poor Farmers.” Agronomy for Sustainable Development 40, no. 3 (June 2020): 20. https://doi.org/10.1007/s13593-020-00624-5. -->
This week we will read and discuss chapter four of Hubbard's 'How To Measure Anything'.
Hubbard, Douglas W. How To Measure Anything: Finding the Value of Intangibles in Business. 2nd ed. Vol. Second Edition. Hoboken, New Jersey: John Wiley & Sons, 2014.
### Bonus, More plotting options
### Violin & box plot overlays
Here we use R's built in `OrchardSprays` data to run the example from the `tidyverse` [Violin plot](https://ggplot2.tidyverse.org/reference/geom_violin.html) examples [@R-tidyverse].
```{r communicating-violin, exercise=TRUE}
ggplot(OrchardSprays, aes(y = decrease, x = treatment, fill = treatment))+
geom_violin() +
geom_boxplot(width = 0.1) +
theme(legend.position = "none")
```
### Ridge line plot
A variation on the example from [edav](https://edav.info/ridgeline.html) using the `ggridges` library [@R-ggridges].
```{r communicating-ggridges, warning=FALSE, message=FALSE, exercise=TRUE}
ggplot(OrchardSprays,
aes(x=decrease,y = treatment,fill = treatment)) +
geom_density_ridges_gradient(scale=2) +
theme_ridges() +
theme(legend.position = "none")
```
More examples on the `rdrr.io` [CRAN](https://rdrr.io/cran/ggridges/man/geom_ridgeline_gradient.html) website.
To see more options for plotting high dimensional data visit the [High Domensional Data](http://htmlpreview.github.io/?https://github.com/hortibonn/Plotting-High-Dimensional-Data/blob/master/HighDimensionalData.html) vignette.