Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle missing values when calculating confidence intervals #521

Merged
merged 2 commits into from
Jan 31, 2024
Merged

Conversation

simonpcouch
Copy link
Collaborator

Closes #520. :)

With this PR:

library(infer)

data <- data.frame(
   prop = seq(0, 1, length.out = 10),
   group = rep(c("a", "b"), each = 5L)
)

set.seed(1)
boot_dist <-
   data %>%
   specify(prop ~ group) %>%
   hypothesize(null = "independence") %>%
   generate(reps = 1000, type = "bootstrap") %>%
   calculate(stat = "diff in medians", order = c("b", "a"))

get_confidence_interval(boot_dist, .95)
#> Warning: 4 estimates were missing and were removed when 
#> calculating the confidence interval.
#> # A tibble: 1 × 2
#>   lower_ci upper_ci
#>      <dbl>    <dbl>
#> 1    0.278    0.889

Created on 2023-12-22 with reprex v2.0.2

@simonpcouch
Copy link
Collaborator Author

Requesting your review so I don't forget to later, @mine-cetinkaya-rundel, but this review is not at all urgent. I'll be AFK the next couple weeks, so truly no rush.🙂☃️🎄

@mine-cetinkaya-rundel
Copy link
Collaborator

This looks good to me @simonpcouch! One question is whether we want this to be a warning or a message. If warning is consistent with other messaging in the package, then that's ok. Otherwise, I've been thinking about warnings as call to action and messages as call to attention. There is nothing the user could do to address this, it's just good for them to know, so it feels more like a message to me. But again, consistency with the rest of the package is probably more important.

@simonpcouch
Copy link
Collaborator Author

I think that's a good point. Without regard for consistency, I think I'd agree with you and go ahead and change that warning to a message. There is some precedent for warning when informing the user about missing values, though. :/

cli_warn("Removed {sum(!is_complete)} rows containing missing values.")

infer/R/calculate.R

Lines 35 to 39 in e5095f0

#' In some cases, when bootstrapping with small samples, some generated
#' bootstrap samples will have only one level of the explanatory variable
#' present. For some test statistics, the calculated statistic in these
#' cases will be NaN. The package will omit non-finite values from
#' visualizations (with a warning) and raise an error in p-value calculations.

@simonpcouch simonpcouch merged commit 3866325 into main Jan 31, 2024
8 checks passed
@simonpcouch simonpcouch deleted the ci-520 branch January 31, 2024 14:33
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants