Seminar_10_Communicating.Rmd

## Seminar 10: Communicating Decision Support {#communicating}

Welcome to seminar 10 of **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to [Dr. Cory Whitney](mailto:cory.whitney@uni-bonn.de?subject=[Lecture_6]%20Decision%20Analysis%20Lecture) or the course tutor.

### Communicating results of Decision Analysis models

The results of a Monte Carlo model are not always easy to interpret and to communicate about. One important step in communicating these often very large data sets is with good visualization.  In previous lectures and seminars we have covered the basics of plotting the distributions of model results for comparisons between decision options. In this lecture we will build on what we learned in the [Decision Models](#decision_models) lecture and the [Model programming](#model_programming) seminar to learn more about how to generate useful plots for communicating model results. 

In the [Model programming part 2](#model_programming_cont) seminar we generated a model called `example_mc_simulation`. As a quick refresher change the plot of the results of that model from `hist_simple_overlay` to `boxplot_density` by using the `method` argument in the `plot_distributions` function.

In this seminar we will build on what we learned in the [Decision Models](#decision_models) lecture and the [Model programming part 1](#model_programming) seminar to learn more about how to generate useful forecasts for communicating model results. We will implement another Monte Carlo simulation.

In the [Model programming part 2](#model_programming_cont) seminar we generated a model called `example_mc_simulation`. As a quick refresher change the plot of the results of that model from `boxplot_density` to `smooth_simple_overlay` by using the `method` argument in the `plot_distributions` function.

```{r forecasts_plot_distribution, exercise=TRUE}

plot_distributions(mcSimulation_object = example_mc_simulation,
                   vars = "final_result",
                   method = "boxplot_density",
                   old_names = "final_result",
                   new_names = "Outcome distribution for profits")

```

```{r forecasts_plot_distribution-solution}

plot_distributions(mcSimulation_object = example_mc_simulation,
                   vars = "final_result",
                   method = "smooth_simple_overlay",
                   old_names = "final_result",
                   new_names = "Outcome distribution for profits")

```

### Making forecasts

First let's look at a case study on agroforestry interventions in Northwest Vietnam [@do_decision_2020].

<iframe width="560" height="315" src="https://www.youtube.com/embed/FZfACcoX4O0" frameborder="0" allowfullscreen></iframe>

```{r do-forecasts-question-1, echo=FALSE}
question("How do probabilistic models differ from frequentist or deterministic models?",
  answer("They contain more variables."),
  answer("They do not have as broad a range of applicability across different domains of science."),
  answer("Unlike deterministic models, probabilistic models can accept uncertainty as inputs.", correct = TRUE),
  answer("Unlike deterministic models, probabilistic models can produce uncertain distributions as outputs.", correct = TRUE),
  incorrect = "Watch [the talk](https://youtu.be/FZfACcoX4O0) and try again.",
allow_retry = TRUE
)
```

```{r do-forecasts-question-2, echo=FALSE}
question("What are the steps in the proposed process for rational decision-making?",
  answer("Gather empirical evidence and data and apply regression tools as part of an exploratory process to look for relationships. Then decide based on any relevant results."),
  answer("Perform interviews and then decide based on the suggestion of experts."),
  answer("Pool information, quantify uncertainty, produce assessments, then either measure where information value is high or take decision.", correct = TRUE),
  incorrect = "Watch [the talk](https://youtu.be/FZfACcoX4O0) and try again.",
allow_retry = TRUE
)
```

```{r do-forecasts-question-3, echo=FALSE}
question("Why was Projection to Latent Structure (PLS) analysis used in the study?",
  answer("To generate a p-value for the reliability of the model results."),
  answer("To perform sensitivity analysis.", correct = TRUE),
  answer("To assess the sensitivity of the output with the many collinear model inputs using the Varible Importance in the Projection (VIP) score.", correct = TRUE),
  incorrect = "Watch [the talk](https://youtu.be/FZfACcoX4O0) and try again.",
allow_retry = TRUE
)
```

### Rebuild the example_mc_simulation

Here we build the example_mc_simulation again for use in the following exercises. 

```{r example_mc_simulation_seminar10, eval=FALSE}
# input_estimates <- read.csv("data/input_estimates.csv")

model_function <- function(){
  income <- Yield * Market_price
  overall_costs <- Labor_cost + Management_cost
  final_result <- income - overall_costs
  return(list(final_result = final_result))
}

example_mc_simulation <- decisionSupport::mcSimulation(estimate = decisionSupport::as.estimate(input_estimates),
                                      model_function = model_function,
                                      numberOfModelRuns = 10000,
                                      functionSyntax = "plainNames")
```

```{r communicating-plot_distribution, exercise=TRUE}

plot_distributions(mcSimulation_object = example_mc_simulation,
                   vars = "final_result",
                   method = "hist_simple_overlay",
                   old_names = "final_result",
                   new_names = "Outcome distribution for profits")

```

```{r communicating-plot_distribution-solution}

plot_distributions(mcSimulation_object = example_mc_simulation,
                   vars = "final_result",
                   method = "boxplot_density",
                   old_names = "final_result",
                   new_names = "Outcome distribution for profits")

```

### Plot cashflow

The `plot_cashflow` function from the `decisionSupport` package [@R-decisionSupport] creates a cashflow plot of the returned list of related outputs from the `mcSimulation` function using ggplot2 [@R-ggplot2]. The function automatically defines quantiles (5 to 95% and 25 to 75%) as well as a value for the median.

```{r communicating-plot_cashflow, exercise=TRUE}
# Plotting the cashflow:

# Create the estimate object (for multiple options):

variable = c("revenue_option1", "costs_option1", "n_years", 
             "revenue_option2", "costs_option2")
distribution = c("norm", "norm", "const", "norm", "norm")
lower = c(10000,  5000, 20, 8000,  500)
upper = c(100000, 50000, 20, 80000,  30000)

costBenefitEstimate <- as.estimate(variable, distribution, lower, upper)

# Define the model function without name for the return value:

profit1 <- function(x) {
  
cashflow_option1 <- vv(revenue_option1 - costs_option1, n = n_years, var_CV = 100)
cashflow_option2 <- vv(revenue_option2 - costs_option2, n = n_years, var_CV = 100)

return(list(Revenues_option1 = revenue_option1,
            Revenues_option2 = revenue_option2,
            Costs_option1 = costs_option1,
            Costs_option2 = costs_option2,
            Cashflow_option_one = cashflow_option1,
            Cashflow_option_two = cashflow_option2))
}

# Perform the Monte Carlo simulation:

predictionProfit1 <- mcSimulation(estimate = costBenefitEstimate,
                                  model_function = profit1,
                                  numberOfModelRuns = 10000,
                                  functionSyntax = "plainNames")


# Plot the cashflow distribution over time

plot_cashflow(mcSimulation_object = predictionProfit1, cashflow_var_name = "Cashflow_option_one",
              x_axis_name = "Years with intervention",
              y_axis_name = "Annual cashflow in USD",
              color_25_75 = "green4", color_5_95 = "green1",
              color_median = "red")
```

Plot the cashflow with panels to compare the cashflow distribution over time for multiple decision options:

```{r communicating-plot_cashflow-panels, exercise=TRUE}

# Plotting the cashflow:

# Create the estimate object (for multiple options):

variable = c("revenue_option1", "costs_option1", "n_years", 
             "revenue_option2", "costs_option2")
distribution = c("norm", "norm", "const", "norm", "norm")
lower = c(10000,  5000, 10, 8000,  500)
upper = c(100000, 50000, 10, 80000,  30000)

costBenefitEstimate <- as.estimate(variable, distribution, lower, upper)

# Define the model function without name for the return value:

profit1 <- function(x) {
  
cashflow_option1 <- vv(revenue_option1 - costs_option1, n = n_years, var_CV = 100)
cashflow_option2 <- vv(revenue_option2 - costs_option2, n = n_years, var_CV = 100)

return(list(Revenues_option1 = revenue_option1,
            Revenues_option2 = revenue_option2,
            Costs_option1 = costs_option1,
            Costs_option2 = costs_option2,
            Cashflow_option_one = cashflow_option1,
            Cashflow_option_two = cashflow_option2))
}

# Perform the Monte Carlo simulation:

predictionProfit1 <- mcSimulation(estimate = costBenefitEstimate,
                                  model_function = profit1,
                                  numberOfModelRuns = 10000,
                                  functionSyntax = "plainNames")

plot_cashflow(mcSimulation_object = predictionProfit1, 
              cashflow_var_name = c("Cashflow_option_one", "Cashflow_option_two"),
              x_axis_name = "Years with intervention",
              y_axis_name = "Annual cashflow in USD",
              color_25_75 = "green4", color_5_95 = "green1",
              color_median = "red", 
              facet_labels = c("Option 1", "Option 2"))
  
```

### Plot many results together

We can use the `compound_figure` function to create a simple compound figure of model results and analyses of a binary decision (do or do not do). The figure includes the distribution of the expected outcome, the expected cashflow, as well as the variable importance and the value of information. 

```{r communicating-compound_figure, exercise=TRUE}

# Create the estimate object:

cost_benefit_table <- data.frame(label = c("Revenue", "Costs"),
                                  variable = c("revenue", "costs"),
                                  distribution = c("norm", "norm"),
                                  lower = c(100,  500),
                                  median = c(NA, NA),
                                  upper = c(10000, 5000))

# (a) Define the model function without name for the return value:

profit1 <- function() {
  Decision <- revenue - costs
  cashflow <- rnorm(rep(revenue, 20))
  return(list(Revenues = revenue,
              Costs = costs, 
              cashflow = cashflow, 
              Decision = Decision))
}

compound_figure(model = profit1, 
input_table = cost_benefit_table, 
decision_var_name = "Decision",
cashflow_var_name = "cashflow",
model_runs = 1e2, 
distribution_method = 'smooth_simple_overlay')


```

### Other visualization options

Here we demonstrate a few more various graphical options to visualize uncertainty intervals of outcomes of Monte Carlo simulations. We create a data set of yield distributions of three different farming practices and use the function `mcmc_intervals()` from the `bayesplot` library to plot the data set [@R-bayesplot].

```{r communicating-bayes_plot-mcmc_intervals, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5), 
                   "practice 2" = rnorm(n = 1000, mean = 7, sd = 1), 
                   "practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))


color_scheme_set("red")
mcmc_intervals(test,prob = 0.5,prob_outer = 0.9,point_est = "median")
```

Do the same with the with `mcmc_areas()` function from the `bayesplot` library [@R-bayesplot].

```{r communicating-bayes_plot-mcmc_areas, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5), 
                   "practice 2" = rnorm(n = 1000, mean = 7, sd = 1), 
                   "practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))

color_scheme_set("blue")

mcmc_areas(test,prob = 0.9,point_est = "median")
```

### Comparative density curves

We can also use `geom_density()`in `ggplot2` to compare the spread of different distributions [@R-ggplot2]:

```{r communicating-geom_density, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5), 
                   "practice 2" = rnorm(n = 1000, mean = 7, sd = 1), 
                   "practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))

stacked_test <- stack(test)

ggplot(stacked_test, 
       aes(x=values,group=ind,fill=ind )) +
        geom_density(colour=NA,alpha=.5) +
        ylab("Probability density") +
        xlab("Yield")
```

### Comparative histogram

Use `ggplot2` `geom_histogram()`function to show the histogram of the data in comparison:

```{r communicating-geom_histogram, exercise=TRUE}
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5), 
                   "practice 2" = rnorm(n = 1000, mean = 7, sd = 1), 
                   "practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))

stacked_test <- stack(test)

ggplot(stacked_test,aes(x=values))+ 
  geom_histogram(data=subset(stacked_test,ind =='practice.1'),
                 aes(fill = ind), alpha = 0.5, bins = 150) + 
  geom_histogram(data=subset(stacked_test,ind == 'practice.2'),
                 aes(fill = ind), alpha = 0.5, bins = 150) +
  geom_histogram(data=subset(stacked_test,ind == 'practice.3'),
                 aes(fill = ind), alpha = 0.5, bins = 150) 
```

 
### Reading 

<!-- ?NEXT YEAR ADD THIS HERE AND REMOVE FROM other CHAPTER?-->
<!-- This week we will read and discuss 'Decision Analysis of Agroforestry Options Reveals Adoption Risks for Resource-Poor Farmers' by @do_decision_2020. -->

<!-- Do, Hoa, Eike Luedeling, and Cory Whitney. “Decision Analysis of Agroforestry Options Reveals Adoption Risks for Resource-Poor Farmers.” Agronomy for Sustainable Development 40, no. 3 (June 2020): 20. https://doi.org/10.1007/s13593-020-00624-5. -->

This week we will read and discuss chapter four of Hubbard's 'How To Measure Anything'.

Hubbard, Douglas W. How To Measure Anything: Finding the Value of Intangibles in Business. 2nd ed. Vol. Second Edition. Hoboken, New Jersey: John Wiley & Sons, 2014.

### Bonus, More plotting options

### Violin & box plot overlays

Here we use R's built in `OrchardSprays` data to run the example from the `tidyverse` [Violin plot](https://ggplot2.tidyverse.org/reference/geom_violin.html) examples [@R-tidyverse].

```{r communicating-violin, exercise=TRUE}

ggplot(OrchardSprays, aes(y = decrease, x = treatment, fill = treatment))+
  geom_violin() +
  geom_boxplot(width = 0.1) +
  theme(legend.position = "none")

```

### Ridge line plot

A variation on the example from [edav](https://edav.info/ridgeline.html) using the `ggridges` library [@R-ggridges].

```{r communicating-ggridges, warning=FALSE, message=FALSE, exercise=TRUE}

ggplot(OrchardSprays, 
       aes(x=decrease,y = treatment,fill = treatment)) +
  geom_density_ridges_gradient(scale=2) + 
  theme_ridges() +
  theme(legend.position = "none")

```

More examples on the `rdrr.io` [CRAN](https://rdrr.io/cran/ggridges/man/geom_ridgeline_gradient.html) website.

To see more options for plotting high dimensional data visit the [High Domensional Data](http://htmlpreview.github.io/?https://github.com/hortibonn/Plotting-High-Dimensional-Data/blob/master/HighDimensionalData.html) vignette.