Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an issue with using a histogram to plot continuous data? #67

Closed
athowes opened this issue May 28, 2024 · 3 comments
Closed

Is there an issue with using a histogram to plot continuous data? #67

athowes opened this issue May 28, 2024 · 3 comments
Labels
low For a future release

Comments

@athowes
Copy link
Collaborator

athowes commented May 28, 2024

See discussion on #59.

I am not convinced that there is a problem / bias that comes from plotting continuous data using a histogram. See:

set.seed(123)

plot_hist <- function(n) {
  meanlog <- 1.8
  sdlog <- 0.5
  
  data.frame(value = rlnorm(n, meanlog = meanlog, sdlog = sdlog)) |>
    ggplot(aes(x = value)) +
    geom_histogram(aes(y = ..density..), bins = 50, color = "black", fill = "grey90") +
    stat_function(fun = dlnorm, args = list(meanlog = meanlog, sdlog = sdlog), color = "firebrick", size = 1) +
    labs(title = paste0("Sample size: ", n), y = "", x = "") +
    theme_minimal()
}

plots <- lapply(10^{0:5}, plot_hist)
patchwork::wrap_plots(plots)

image

Because of this, I think that we should be plotting the continuous data in Figure 2.1 in the get started vignette. In my opinion, this is beneficial as it highlights that epidist deals with two issues: both right truncation and censoring.

@seabbs or @parksw3 if you could point me to the relevant section in Park et al. 2023 to show that there is a problem with plotting continuous data as above, or suggest altered simulation settings to show the problem, it'd be helpful. Otherwise I suggest we change the vignette to use the continuous data.

@seabbs
Copy link
Contributor

seabbs commented May 28, 2024

I think the more useful conversation is should we plot real-time data, retrospective data and discretised continuous data with the latter being what you are proposing. I'm unclear what the argument is for the latter Vs retrospective data?

Just from reading the current vignette it's not clear at all

I think that if this is simulated from a fixed event time i.e 0 it won't bias the mean but if the delay is constructed from a initial censoring event and the delay it will.

The place to look here is figure 3 and 6 I believe.

@parksw3
Copy link
Collaborator

parksw3 commented May 29, 2024

Just to clarify, when I said "We could also consider plotting something like the mean (vertical line) to show the bias." I meant the bias between continuous vs discrete time data. I was not referring to "histogram bias". I guess what I'm saying lines up with "I do perhaps see that you'd want the histogram bins to line up with "daily" if that's what you mean also" in #57 ?

Otherwise, I'm happy with using histogram for continuous data as long as we make it clear that it's continuous data.

@athowes
Copy link
Collaborator Author

athowes commented Oct 1, 2024

I'm closing this as no longer relevant.

@athowes athowes closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low For a future release
Projects
None yet
Development

No branches or pull requests

3 participants