Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested grouping #41

Open
grantmcdermott opened this issue Jun 28, 2023 · 1 comment
Open

Nested grouping #41

grantmcdermott opened this issue Jun 28, 2023 · 1 comment

Comments

@grantmcdermott
Copy link
Owner

grantmcdermott commented Jun 28, 2023

It would be nice if we could support nested grouping. (Or, put differently, allow colours to vary/repeat across units.) This would mostly be useful for line plots where we want to avoid joining the end of one line with the start of another. The idea is similar to how ggplot2 allows you to specific aes(col = var1, group = var2) separately.

Here is an illustration using the following dataset. The setting is a difference-in-differences research design with staggered treatment. So we have treatment cohorts (first_treat) superimposed on individual units (id).

  1. First, points. (Fine.)
plot2(y ~ time | first_treat, dat)

  1. Second, lines. (Not fine, because we have lines rejoining across units in the same cohort.)
plot2(y ~ time | first_treat, dat, type = "l")

Of course, we could group (colour) by the individual IDs. This stops the rejoining, but means that we lose the colouring by treatment group (which is the interesting thing from a causal inference perspective).

plot2(y ~ time | id, dat, type = "l", legend = FALSE)

I don't have a solution right now, but it probably requires a new argument like bycol. On the formula side, we could potentially represent this via a / nesting interaction. So the call would become plot2(y ~ time | first_treat / id, dat, type = "l"), i.e. units are nested within first treatment cohorts.

@grantmcdermott
Copy link
Owner Author

grantmcdermott commented Aug 30, 2024

Ran into this again recently and am now thinking a simpler solution is just to support passing a variable to col. It should be pretty simple to grab the corresponding colour breaks and pass them to our group-split data, by using something like tapply(factor(col_var), by_var, FUN = [[, 1) internally.

Manual proof of concept:

library(tinyplot)

set.seed(123456L)

# 60 time periods, 30 individuals, and 5 waves of treatment
tmax = 60
imax = 30
nlvls = 5

dat = 
  expand.grid(time = 1:tmax, id = 1:imax) |>
  within({
    
    cohort      = NA
    effect      = NA
    first_treat = NA
    
    for (chrt in 1:imax) {
      cohort = ifelse(id==chrt, sample.int(nlvls, 1), cohort)
    }
    
    for (lvls in 1:nlvls) {
      effect      = ifelse(cohort==lvls, sample(2:10, 1), effect)
      first_treat = ifelse(cohort==lvls, sample(1:(tmax+20), 1), first_treat)
    }
    
    first_treat = ifelse(first_treat>tmax, Inf, first_treat)
    treat       = time>=first_treat
    rel_time    = time - first_treat
    y           = id + time + ifelse(treat, effect*rel_time, 0) + rnorm(imax*tmax)
    
    rm(chrt, lvls, cohort, effect)
  })

cols = with(dat, tapply(factor(first_treat), id, FUN = `[[`, 1))  # grab group colours
cols
#>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
#>  1  3  3  5  4  1  3  2  2  4  2  1  3  4  5  3  4  2  1  2  3  4  3  2  3  5 
#> 27 28 29 30 
#>  5  3  1  3

plt(y ~ time | id, dat, type = "l", col = palette()[cols], legend = FALSE)
#> Warning in tinyplot.default(x = x, y = y, by = by, facet = facet, facet.args = facet.args, : 
#> Continuous legends not supported for this plot type. Reverting to discrete legend.

Created on 2024-08-30 with reprex v2.1.1

TBD on how to handle legends, as well as NSE vs formula arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant