Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case_when() coerces the multplicity of a returned value to match that from conditions not satisfied #7088

Closed
ChrisHIV opened this issue Sep 30, 2024 · 1 comment

Comments

@ChrisHIV
Copy link

case_when() appears to first decide on the multiplicity of values it will return based on considering all conditions and then coerce the value specified by the satisfied condition into this multiplicity, rather than returning multiplicity of values specified by the satisfied condition. Is this a bug or intentional? It leads to unexpected behaviour when trying to use conditions related to multiplicity of values.

Example

library(dplyr)
tibble(x = c("A", "A", "B", "B"),
       y = c("I", "I", "J", "K")) %>%
  summarise(.by = x,
            summary = case_when(
              length(unique(y)) == 1L ~ paste("This x has one y:", unique(y)),
              TRUE ~ "This x has several ys"
            ))

output:

  x     summary              
  <chr> <chr>                
1 A     This x has one y: I  
2 B     This x has several ys
3 B     This x has several ys

together with a warning message about returning more (or less) than 1 row per summarise() group.

output I expected

  x     summary              
  <chr> <chr>                
1 A     This x has one y: I  
2 B     This x has several ys
@DavisVaughan
Copy link
Member

I think you really just need to use a basic if statement, you aren't doing anything vectorized so you don't need case-when

library(dplyr)

do_it <- function(y) {
  if (length(unique(y)) == 1L) {
    paste("This x has one y:", unique(y))
  } else {
    "This x has several ys"
  }
}

tibble(x = c("A", "A", "B", "B"),
       y = c("I", "I", "J", "K")) %>%
  summarise(.by = x,
            summary = do_it(y))
#> # A tibble: 2 × 2
#>   x     summary              
#>   <chr> <chr>                
#> 1 A     This x has one y: I  
#> 2 B     This x has several ys

A simpler example of what you are trying to demonstrate is:

dplyr::case_when(
  FALSE ~ c(1, 2),
  TRUE ~ 3
)
#> [1] 3 3

I actually think this should be an error. See #7082 (comment) where I talk about this in more detail. The RHSs of case_when() should either have size 1 or size size where size comes from the size of the things on the LHS. The underlying engine already throws an error here:

dplyr:::vec_case_when(
  conditions = list(FALSE, TRUE),
  values = list(c(1, 2), 3)
)
#> Error in `dplyr:::vec_case_when()`:
#> ! `values[[1]]` must have size 1, not size 2.

But anyways, for your use case of having 2 conditions that basically amount to:

  • Something to do for TRUE
  • Something to do for FALSE

I think you are much better served by an if statement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants