Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
nikosbosse authored and seabbs committed Mar 22, 2024
1 parent d6844d5 commit 97c2d6b
Show file tree
Hide file tree
Showing 3 changed files with 191 additions and 190 deletions.
104 changes: 51 additions & 53 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,96 +53,94 @@ if (devel) {
cat("\n\n")
```

The `scoringutils` package provides a collection of metrics and proper scoring rules and aims to make it simple to score probabilistic forecasts against observed values.
The `scoringutils` package facilitates the process of evaluating forecasts in R, using a convenient and flexible `data.table`-based framework. It provides broad functionality to check the input data and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package is easily extendable, meaning that users can supply their own scoring rules or extend existing classes to handle new types of forecasts.

A good starting point for those wishing to use `scoringutils` are the vignettes on [Getting started](https://epiforecasts.io/scoringutils/articles/scoringutils.html), [Details on the metrics implemented](https://epiforecasts.io/scoringutils/articles/metric-details.html) and [Scoring forecasts directly](https://epiforecasts.io/scoringutils/articles/scoring-forecasts-directly.html).
The package underwent a major re-write. The most comprehensive documentation for the [updated version](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link) is the revised version of our [original](https://doi.org/10.48550/arXiv.2205.07090) `scoringutils` paper.

For a detailed description of the package, its rationale and design, usage examples and how it relates to other packages in the R ecosystem, please see the corresponding paper:
Another good starting point are the vignettes on [Getting started](https://epiforecasts.io/scoringutils/articles/scoringutils.html), [Details on the metrics implemented](https://epiforecasts.io/scoringutils/articles/metric-details.html) and [Scoring forecasts directly](https://epiforecasts.io/scoringutils/articles/scoring-forecasts-directly.html).

> Nikos I. Bosse, Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk and Sam Abbott (2022). _`Evaluating Forecasts with scoringutils in R`_. arXiv:2205.07090 <https://doi.org/10.48550/arXiv.2205.07090>
<!-- > Nikos I. Bosse, Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk and Sam Abbott (2022). _`Evaluating Forecasts with scoringutils in R`_. arXiv:2205.07090 <https://doi.org/10.48550/arXiv.2205.07090> -->

For further details on the specific issue of transforming forecasts for scoring see:

> Nikos I. Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher\* and Sebastian Funk\* (\*: equal contribution) (2023). _`Scoring epidemiological forecasts on transformed scales`_, PLoS Comput Biol 19(8): e1011393 <https://doi.org/10.1371/journal.pcbi.1011393>
## Package overview

The `scoringutils` package offers convenient automated forecast evaluation through the function `score()`. The function operates on data.frames (it uses `data.table` internally for speed and efficiency) and can easily be integrated in a workflow based on `dplyr` or `data.table`. It also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.

Where available `scoringutils` depends on functionality from `scoringRules` which provides a comprehensive collection of proper scoring rules for predictive probability distributions represented as sample or parametric distributions. For some forecast types, such as quantile forecasts, `scoringutils` also implements additional metrics for evaluating forecasts. On top of providing an interface to the proper scoring rules implemented in `scoringRules` and natively, `scoringutils` also offers utilities for summarising and visualising forecasts and scores, and to obtain relative scores between models which may be useful for non-overlapping forecasts and forecasts across scales.

Predictions can be handled in various formats: `scoringutils` can handle probabilistic forecasts in either a sample based or a quantile based format. For more detail on the expected input formats please see below. True values can be integer, continuous or binary, and appropriate scores for each of these value types are selected automatically.

## Installation

Install the CRAN version of this package using:
Install the CRAN version of this package using

```{r, eval = FALSE}
install.packages("scoringutils")
```

Install the stable development version of the package with:
Install the unstable development version from GitHub using

```{r, eval = FALSE}
install.packages("scoringutils", repos = "https://epiforecasts.r-universe.dev")
remotes::install_github("epiforecasts/scoringutils", dependencies = TRUE)
```

Install the unstable development from GitHub using the following,
## Quick start

```{r, eval = FALSE}
remotes::install_github("epiforecasts/scoringutils", dependencies = TRUE)
### Forecast types

`scoringutils` currently supports scoring the following forecast types:
- `binary`: a probability for a binary (yes/no) outcome variable.
- `point`: a forecast for a continuous or discrete outcome variable that is represented by a single number.
- `quantile`: a probabilistic forecast for a continuous or discrete outcome variable, with the forecast distribution represented by a set of predictive quantiles.
- `sample`: a probabilistic forecast for a continuous or discrete outcome variable, with the forecast represented by a finite set of samples drawn from the predictive distribution.

### Input formats and input validation

The expected input format is generally a `data.frame` (or similar) with required columns `observed`, `predicted`, and `model` that holds the forecasts and observed values. Exact requirements depend on the forecast type. For more information, have a look at the [paper](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link), call `?as_forecast()`, or have a look at the example data provided in the package (`example_binary`, `example_point`, `example_quantile`, `example_continuous`, `example_integer`).

Before scoring, input data needs to be validated and transformed into a forecast object using the function `as_forecast()`.

```{r}
forecast_quantile <- example_quantile |>
as_forecast(
forecast_unit = c(
"location", "forecast_date", "target_end_date", "target_type", "model", "horizon"
),
forecast_type = "quantile"
)
head(forecast_quantile, 2)
```

## Quick start
### The forecast unit
For quantile-based and sample-based forecasts, a single prediction is represented by a set of several quantiles (or samples) from the predictive distribution, i.e. several rows in the input data. `scoringutils` therefore needs to group rows together that form a single forecast. `scoringutils` uses all other existing columns in the input data to achieve this - the values in all other columns should uniquely identify a single forecast. Additional columns unrelated to the forecast unit can mess this up. The `forecast_unit` argument in `as_forecast()` makes sure that only those columns are retained which are relevant for defining the unit of a single forecast.

In this quick start guide we explore some of the functionality of the `scoringutils` package using quantile forecasts from the [ECDC forecasting hub](https://covid19forecasthub.eu/) as an example. For more detailed documentation please see the package vignettes, and individual function documentation.

### Scoring forecasts
Forecasts can be scored by calling `score()` on a validated forecast object.

Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually by specifying the "forecast_unit" argument in `as_forecast()` as this may help to avoid errors. This will drop all columns that are neither part of the forecast unit nor part of the columns internally used by `scoringutils`. The function `as_forecast()` processes and validates the inputs.
`score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of an additional function from `scoringutils` to add scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.

```{r score-example}
example_quantile %>%
as_forecast(forecast_unit = c(
"location", "target_end_date", "target_type", "horizon", "model"
)) %>%
score() %>%
add_pairwise_comparison(
by = c("model", "target_type"),
baseline = "EuroCOVIDhub-ensemble"
) %>%
summarise_scores(
by = c("model", "target_type")
) %>%
summarise_scores(
fun = signif,
digits = 2
) %>%
kable()
```{r}
scores <- forecast_quantile |>
score()
```

`scoringutils` contains additional functionality to transform forecasts, to summarise scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.
`score()` takes an additional argument, `metrics`, with a list of scoring rules. Every forecast type has a default list of metrics. You can easily add your own scoring functions, as long as they conform with the format for that forecast type. See the [paper](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link) for more information.

You may want to score forecasts based on transformations of the original data in order to obtain a more complete evaluation (see [this paper](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1) for more information). This can be done using the function `transform_forecasts()`. In the following example, we truncate values at 0 and use the function `log_shift()` to add 1 to all values before applying the natural logarithm.
You can summarise scores using the function `summarise_scores()`. The `by` argument is used to specify the desired level of summary. `fun` let's you specify any summary function, although it is recommended to stick to the mean as a primary summary function, as other functions can lead to improper scores.

```{r}
example_quantile %>%
.[, observed := ifelse(observed < 0, 0, observed)] %>%
as_forecast() %>%
transform_forecasts(append = TRUE, fun = log_shift, offset = 1) %>%
score %>%
summarise_scores(by = c("model", "target_type", "scale")) %>%
summarise_scores(by = c("model", "target_type", "scale"), fun = signif, digits = 3) %>%
head()
scores |>
summarise_scores(by = c("model", "target_type")) |>
summarise_scores(by = c("model", "target_type"), fun = signif, digits = 3)
```

## Package workflow

The following depicts the suggested workflow for evaluating forecasts with `scoringutils` (sections refer to the paper). Please find more information in the [paper](https://drive.google.com/file/d/1URaMsXmHJ1twpLpMl1sl2HW4lPuUycoj/view?usp=drive_link), the function documentation and the vignettes.

![](./man/figures/workflow.png)

## Citation

If using `scoringutils` in your work please consider citing it using the output of `citation("scoringutils")`:
If you are using `scoringutils` in your work please consider citing it using the output of `citation("scoringutils")` (or `print(citation("scoringutils"), bibtex = TRUE)`):

```{r, echo = FALSE}
citation("scoringutils")
citation("scoringutils")
```

## How to make a bug report or feature request
Expand Down
Loading

0 comments on commit 97c2d6b

Please sign in to comment.