Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fixed an issue of a solitary NA as var_labels passed to analyze #965

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kpagacz
Copy link

@kpagacz kpagacz commented Nov 22, 2024

Here's an MRE of the issue:

library(rtables)
# devtools::load_all()
lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  analyze("AGE", var_labels = c(AGE = NA))

which results in:

Error in validObject(.Object) :
invalid class “AnalyzeVarSplit” object: invalid object for slot "split_l
abel" in class "AnalyzeVarSplit": got class "logical", should be or extend
class "character"

R treats a single NA as a logical vector, which fails the validation on the Split object, which requires the labels slot to be a character vector. This issue does not happen when the var_labels array contains other character values. The untyped NAs are cast to characters automatically in this case.

This also does not happen in this scenario:

library(rtables)
# devtools::load_all()
lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  analyze("AGE", var_labels = c(AGE = NA_character_))

What prompted this is the investigation of what happens when launching the below teal application:

require(teal.modules.general)
require(teal.modules.clinical)
data <- teal_data()
data <- within(data, {
  ADSL <- teal.modules.general::rADSL
  # Remove the label attribute to simulate a dataset without labels
  attributes(ADSL$SEX)$label <- NULL
  ADSL$EOSDY[1] <- NA_integer_
})
join_keys(data) <- default_cdisc_join_keys["ADSL"]

ADSL <- data[["ADSL"]]

app <- init(
  data = data,
  modules = modules(
    teal.modules.clinical::tm_t_summary(
      label = "Demographic Table",
      dataname = "ADSL",
      arm_var = choices_selected(c("ARM", "ARMCD"), "ARM"),
      add_total = TRUE,
      summarize_vars = choices_selected(
        c("SEX", "RACE", "BMRKR2", "EOSDY", "DCSREAS", "AGE"),
        c("SEX")
      ),
      useNA = "ifany"
    )
  )
)
if (interactive()) {
  shinyApp(app$ui, app$server)
}

This application errors out with the above rtables error, because the application tries to evaluate the below:

library(nvimcom)
library(ggplot2)
library(ggmosaic)
library(shiny)
library(teal.code)
library(teal.data)
library(teal.slice)
library(teal)
library(teal.transform)
library(teal.modules.general)
library(formatters)
library(magrittr)
library(rtables)
library(tern)
library(teal.modules.clinical)

ADSL <- teal.modules.general::rADSL
attributes(ADSL$SEX)$label <- NULL
ADSL$EOSDY[1] <- NA

stopifnot(rlang::hash(ADSL) == "fdef4b3486c80ad054476ea2b4131889")

ANL_1 <- ADSL %>% dplyr::select(STUDYID, USUBJID, ARM, SEX)
ANL <- ANL_1
ANL <- ANL %>% teal.data::col_relabel(ARM = "Description of Planned Arm")
ANL_ADSL_1 <- ADSL %>% dplyr::select(STUDYID, USUBJID, ARM)
ANL_ADSL <- ANL_ADSL_1
ANL_ADSL <- ANL_ADSL %>% teal.data::col_relabel(ARM = "Description of Planned Arm")
anl <- ANL %>%
  df_explicit_na(
    omit_columns = setdiff(names(ANL), c(structure(c(SEX = "SEX"), dataname = "ADSL", always_selected = character(0)))),
    na_level = "<Missing>"
  )
anl <- anl %>% dplyr::mutate(ARM = droplevels(ARM))
arm_levels <- levels(anl[["ARM"]])
ANL_ADSL <- ANL_ADSL %>% dplyr::filter(ARM %in% arm_levels)
ANL_ADSL <- ANL_ADSL %>% dplyr::mutate(ARM = droplevels(ARM))
ANL_ADSL <- df_explicit_na(ANL_ADSL, na_level = "<Missing>")
lyt <- rtables::basic_table(
  main_footer = "n represents the number of unique subject IDs such that the variable has non-NA values."
) %>%
  rtables::split_cols_by("ARM", split_fun = drop_split_levels) %>%
  rtables::add_overall_col("All Patients") %>%
  rtables::add_colcounts() %>%
  analyze_vars(
    vars = structure(c(SEX = "SEX"), dataname = "ADSL", always_selected = character(0)),
    var_labels = c(SEX = NA),
    show_labels = "visible",
    na.rm = FALSE,
    na_str = "<Missing>",
    denom = "N_col",
    .stats = c("n", "mean_sd", "mean_ci", "geom_mean", "median", "median_ci", "quantiles", "range", "count_fraction")
  )
result <- rtables::build_table(lyt = lyt, df = anl, alt_counts_df = ANL_ADSL)
result

This fails due to solitary NA in the analyze_vars call, passed directly to rtables::analyze.

Unfortunately, trying to fix the issue on the teal.modules.clinical side is unsuccessful due to how substitute treats such solitary NAs (basically casting them from the explicit NA_character_ to normal NA). The fix could be made on the level of tern, but it seems silly not to go to the bottom, which is rtables::analyze, and deeper - AnalyzeMultiVars, etc...

Let me know if this makes sense for you or if you would like me to take a different route.

Copy link
Contributor

github-actions bot commented Nov 22, 2024

✅ All contributors have signed the CLA
Posted by the CLA Assistant Lite bot.

@kpagacz
Copy link
Author

kpagacz commented Nov 22, 2024

I have read the CLA Document and I hereby sign the CLA

Copy link
Contributor

@Melkiades Melkiades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kpagacz for this! I think it makes sense to do it with one line at source instead of spreading it around, also considering that NA_character_ is fine. Could you just add a comment and a regression test for this? Also add NEWS, thanks!! ;)

@kpagacz
Copy link
Author

kpagacz commented Nov 22, 2024

Added the comment, test and the NEWS entry. Ready for re-review.

Copy link
Contributor

@Melkiades Melkiades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kpagacz! Great work

@shajoezhu
Copy link
Collaborator

Hi @kpagacz , thanks a lot for the PR! great change! thanks a lot. no concerns from my side.

I have a question to @pawelru , I was wondering if r-deps allow us to trigger downstream package checks before we merge?
previously, when we were using staged.dependencies, we could trigger downstream package change by the following step, e.g. scda.test

  1. create a branch, and modify the staged.dpendcies.yml package source, e.g. rtables, change to @kpagacz 's fork
  2. raise a pr, and it will test.

i was wondering should we do the same by modifying the github action check.yml to trigger the downstream cicd tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants