README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# tukeygrps

<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/tukeygrps)](https://cran.r-project.org/package=tukeygrps)
<!-- badges: end -->

Tukeygrps provides simple wrapper functions for the annotation of (gg)plots according to statistical differences between groups determined by a parametric Tukey-HSD test from {stats} or a non-parametric Kruskal-Wallis test with Dunn's test for multiple comparisons from {dunn.test}.

## Installation

You can install tukeygrps from github using remotes:

``` {r, eval=FALSE}
install.packages("remotes")
library("remotes")
install_github("leonardblaschek/tukeygrps")
```

## Examples

*Parametric* multiple comparisons like the Tukey HSD (honest significant differences) test shown in section **1** are only recommended in cases where the data fulfill all of the following conditions:

* **normally** distributed
* **homoscedastic**
* **independent** within and between groups
* **equal** in sample size

If you have strong evidence that they do not fulfill these conditions, consider a *non-parametric* method of comparison, like the Kruskal-Wallis test followed by Dunn's multiple comparisons shown in section **2**.

### 1. Parametric multiple comparisons 

Here we use letter_groups() with stat_method = "tukey" to add letters to a geom_point plot. Alpha is set to 0.001, the letters are printed at y = 0, and there are no additional grouping variables.

```{r example no groups, message=FALSE}
library(tukeygrps)
library(tidyverse)

data(mpg)
head(mpg)

tukey_letters <- letter_groups(mpg, hwy, class, "tukey", print_position = 0, stat_alpha = 0.001)

head(tukey_letters)

ggplot() +
  geom_jitter(
    data = mpg,
    aes(
      x = class,
      y = hwy
    ),
    width = 0.1
  ) +
  geom_text(
    data = tukey_letters,
    aes(
      x = class,
      y = hwy,
      label = Letters
    )
  ) +
  coord_flip()
```

Here we split the statistical analysis by two grouping variables ("cut" and "color"), set the alpha to 0.05 and print the letters 0.5 standard deviations below the respective minimum value.

```{r example, warning=FALSE, message=FALSE}
library(tukeygrps)
library(tidyverse)

data(diamonds)
diamonds <- diamonds %>%
  filter(cut %in% c("Ideal", "Premium", "Very Good") & color %in% c("D", "E", "F"))
head(diamonds)

tukey_letters <- letter_groups(
  diamonds,
  price,
  clarity,
  "tukey",
  cut,
  color,
  print_position = "below",
  print_adjust = 0.5,
  stat_alpha = 0.05,
)

head(tukey_letters)

ggplot() +
  geom_jitter(
    data = diamonds,
    aes(
      x = clarity,
      y = price
    ),
    size = 1,
    width = 0.1,
    alpha = 0.25
  ) +
  geom_boxplot(
    data = diamonds,
    aes(
      x = clarity,
      y = price
    ),
    outlier.alpha = 0,
    fill = rgb(1, 1, 1, 0.5)
  ) +
  geom_text(
    data = tukey_letters,
    aes(
      x = clarity,
      y = price,
      label = Letters
    ),
    size = 3
  ) +
  facet_grid(cut ~ color) +
  coord_flip()
```

### 2. Non-parametric multiple comparisons 

In case the above requirements for parametric tests are not met, we can fall back to the non-parametric Kruskal–Wallis test followed by Dunn's test and *p*-value adjustment for multiple comparisons. Here we place the letter codes 0.5 standard deviations above the maximum values.

```{r example non-parametric, warning=FALSE, message=FALSE, results='hide'}
library(tukeygrps)
library(tidyverse)

data(diamonds)
diamonds <- diamonds %>%
  filter(cut %in% c("Ideal", "Premium", "Very Good") & color %in% c("D", "E", "F"))

kruskal_letters <- letter_groups(
  diamonds,
  price,
  clarity,
  "kruskal",
  cut,
  color,
  print_position = "above",
  print_adjust = 0.5,
  p_adj_method = "holm"
)

head(kruskal_letters)

ggplot() +
  geom_jitter(
    data = diamonds,
    aes(
      x = clarity,
      y = price
    ),
    size = 1,
    width = 0.1,
    alpha = 0.25
  ) +
  geom_boxplot(
    data = diamonds,
    aes(
      x = clarity,
      y = price
    ),
    outlier.alpha = 0,
    fill = rgb(1, 1, 1, 0.5)
  ) +
  geom_text(
    data = kruskal_letters,
    aes(
      x = clarity,
      y = price,
      label = Letters
    ),
    size = 3
  ) +
  facet_grid(cut ~ color) +
  coord_flip()
```