Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

913: Functions for PIT histograms #949

Merged
merged 15 commits into from
Oct 22, 2024
Merged

913: Functions for PIT histograms #949

merged 15 commits into from
Oct 22, 2024

Conversation

sbfnk
Copy link
Contributor

@sbfnk sbfnk commented Oct 16, 2024

Description

This PR closes #913.

It adds plot_histogram_sample() pit_histogram_sample() and get_pit_histogram() functions and implements the nonrandomised PIT for count data suggested by Czado et al.

It also removes get_pit() and pit_sample() - I don't think calculating PIT values themselves is very interesting or something anyone would ever want to do but if people disagree we could keep them in.

It also removes plot_pit(). Now that densities of the histogram can be obtained directly I would think that a user will be able to make their own plot.

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title as follows: issue-number: PR title
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • I have built the package locally and run rebuilt docs using roxygen2.
  • My code follows the established coding standards and I have run lintr::lint_package() to check for style issues introduced by my changes.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

@nikosbosse
Copy link
Contributor

Nice, thank you!
I suggested some cosmetic changes in #950.

Broader thoughts/questions:

  • I see the point you raised in get_ function names #948 - prefixing the functions with get_ indeed feels a bit weird.
    • one alternative I could think of is changing names to pit_histogram.forecast_sample() and pit_histogram_sample()?
  • I'm wondering whether it makes sense to keep pit_histogram_sample() exported now that it returns a histogram rather than the raw pit values. But I think it does?
  • plot_histogram_sample() does not exist anymore, does it? The PR description mention still mentions it. I personally like having a plot function. At least I think it would be nice to provide the code as an example

@sbfnk
Copy link
Contributor Author

sbfnk commented Oct 21, 2024

  • one alternative I could think of is changing names to pit_histogram.forecast_sample() and pit_histogram_sample()?

This would be my suggestion (i.e. just remove the get_ bits) but probably best left for that issue to be addressed alongside any other function(s) that should get the same treatment.

  • I'm wondering whether it makes sense to keep pit_histogram_sample() exported now that it returns a histogram rather than the raw pit values. But I think it does?

Whatever we do I think it deserves the same treatment as related functions (e.g. bias_sample()) - assuming they would have some utility for those who don't want to go down the data.frame/S3 path perhaps keep them exported?

  • plot_histogram_sample() does not exist anymore, does it? The PR description mention still mentions it. I personally like having a plot function. At least I think it would be nice to provide the code as an example

It never existed. I meant pit_histogram_sample() and have now edited the PR description. My impression was with the new function structure plotting is now so trivial that it doesn't need its own function (and people can choose their favourite base R / ggplot / etc routine). There is an example in the manuscript:

example_sample_continuous |>
  as_forecast_sample() |>
  get_pit_histogram(by = c("model", "target_type")) |>
  ggplot(aes(x = mid, y = density)) +
  geom_col() +
  facet_grid(target_type ~ model) +
  labs(x = "Quantile", "Density")

Do you think this should be replicated somewhere else?

@nikosbosse
Copy link
Contributor

nikosbosse commented Oct 21, 2024

Do you think this should be replicated somewhere else?

I'm leaning slightly towards a function (I would personally have to think more than 30 seconds about how to plot this + in order to plot this you need to know what a pit histogram needs to look like).

As an alternative I would suggest adding the code snippet to the function examples + adding a comment such as "For an example of how to plot the output, see the examples section" in the function docs.
I don't feel that strongly because we can always go from option 2 to option 1 in the future.

@sbfnk
Copy link
Contributor Author

sbfnk commented Oct 21, 2024

I'm leaning slightly towards a function (I would personally have to think more than 30 seconds about how to plot this + in order to plot this you need to know what a pit histogram needs to look like).

Thinking about this I think this is a bit more complicated in the new set up because the plot function doesn't necessarily know what to plot / facet (i.e. what was passed to by).

Alternatively we could change get_pit_histogram() to return objects of class histogram (i.e. same as returned by hist)? That would be nice as mids, bins etc. would all be taken care of, and there would be a simple plot() function. However, the interaction with by would be more complicated.

Copy link
Contributor

@nikosbosse nikosbosse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

R/get-pit-histogram.R Show resolved Hide resolved
@nikosbosse
Copy link
Contributor

Shall we merge it?

@sbfnk sbfnk merged commit b370557 into main Oct 22, 2024
9 checks passed
@sbfnk sbfnk deleted the pit-histogram branch October 22, 2024 07:20
@seabbs
Copy link
Contributor

seabbs commented Oct 22, 2024

know what to plot / facet (i.e. what was passed to by).

In epinowcast the facetting is left as an exercise for the user which I like.

@@ -84,6 +84,7 @@ of our [original](https://doi.org/10.48550/arXiv.2205.07090) `scoringutils` pape
- Removed `interval_coverage_sample()` as users are now expected to convert to a quantile format first before scoring.
- Function `set_forecast_unit()` was deleted. Instead there is now a `forecast_unit` argument in `as_forecast_<type>()` as well as in `get_duplicate_forecasts()`.
- Removed `interval_coverage_dev_quantile()`. Users can still access the difference between nominal and actual interval coverage using `get_coverage()`.
- `pit()`, `pit_sample()` and `plot_pit()` have been removed and replaced by functionality to create PIT histograms (`pit_histogram_sampel()` and `get_pit_histogram()`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: sampel -> sample

#' @examples
#' library("ggplot2")
#'
#' example <- as_forecast_sample(example_sample_continuous)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it already of class sample and so this line isn't needed?

pit_sample <- function(observed,
predicted,
n_replicates = 100) {
pit_histogram_sample <- function(observed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting if/when we get rid of get_ then pit_histogram and pit_histogram_sample are quite hard to distinguish.

plot_pit() +
facet_grid(target_type ~ model)
get_pit_histogram(by = c("model", "target_type")) |>
ggplot(aes(x = mid, y = density)) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer we do ship a simpler wrapper around this code (and the labs) to reduce duplication though obviously it isn't very complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to reinstate plot_pit(), maybe even as S3 function that calls get_pit_histogram() and then plots? Or something else?

facet_wrap(~ model, nrow = 1) +
theme_scoringutils()
theme_scoringutils() +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one uses the scoringutils theme and the other one doesn't

@seabbs seabbs mentioned this pull request Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Randomised -> non-randomised PIT for count values
3 participants