Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move computation of p-values out of get_pairwise_comparisons()? #750

Open
nikosbosse opened this issue Mar 26, 2024 · 5 comments
Open

Move computation of p-values out of get_pairwise_comparisons()? #750

nikosbosse opened this issue Mar 26, 2024 · 5 comments
Labels
question Further information is requested

Comments

@nikosbosse
Copy link
Contributor

Currently, three functions exist that do something related to pairwise comparisons:

  • get_pairwise_comparisons() outputs a data.table with three different things:
    • pairwise mean score ratios between models
    • relative skill scores (geometric mean of the mean score ratios)
    • pairwise p-values
  • add_relative_skill() adds relative skill scores to an existing scores object (calling get_pairwise_comparisons()`)
  • plot_pairwise_comparisons() visualises either mean score ratios or the p-values.

Should the calculation of p-values and mean score ratios/relative skill scores be done by the same function?

Pro:

  • both mean score ratios and computation of p-values require a similar mechanic in which two models are compared against each other

  • the function is currently called get_pairwise_comparisons() so it makes sense to have those two things together

Contra:

  • it might be cleaner to separate the code a bit. Both get_pairwise_comparisons() and plot_pairwise_comparisons() currently have code for both things.
  • the workflows feel a bit different. The mean score ratios feel a bit more like "visualisation of performance" and the p-values feel more like "rigorous statistical testing".

In terms of currently suggested workflows we have the following:

For getting relative skill scores, you call as_forecast(data) |> score() |> add_relative_skill().

For visualising mean score ratios, you call

pairwise <- example_quantile |>
  as_forecast() |>
  score() |>
  get_pairwise_comparisons() 

plot_pairwise_comparisons(pairwise)

For visualising p-values, you call

plot_pairwise_comparisons(pairwise, type = "pval")

We previously even had a nice plot that showed both p-values and mean score ratios in a single plot (using the upper and lower triangle), but that broke and we ditched a while ago.


Options:

  1. leave everything as is for now.
  2. remove computation of p-values for now. Maybe rename get_pairwise_comparisons() to get_score_ratios(). Re-introduce functionality later.
  3. do a rewrite with two different workflows before the next CRAN release of version 2.0.0
  4. other
@nikosbosse
Copy link
Contributor Author

@sbfnk @seabbs
@nickreich @elray1 maybe you have thoughts or preferences as well?

@sbfnk
Copy link
Contributor

sbfnk commented Mar 27, 2024

  1. Could get_pairwise_comparisons() have a metric (or the like) option which could be mean_score_ratio (default) or p_value? The plot function could then plot whichever is there.

@nikosbosse
Copy link
Contributor Author

  1. Could get_pairwise_comparisons() have a metric (or the like) option which could be mean_score_ratio (default) or p_value? The plot function could then plot whichever is there.

It would get two metric arguments then :). My intuition is to prefer the status quo over that proposal for the following reasons:

  • we'd have to introduce and name an additional argument (or rather, the argument that decides what to do would be moved from plot_pairwise_comparisons() to get_pairwise_comparisons(). To me, the status quo feels a bit simpler.
  • we would still have the same code complexity issue with two functions covering two slightly different use cases.

@seabbs
Copy link
Contributor

seabbs commented Mar 28, 2024

I think my preference is 1 (i.e do nothing)

@nikosbosse
Copy link
Contributor Author

ok. Moving this to a later release then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Development

No branches or pull requests

3 participants