Review scoringutils 2.0.0 #791

Bisaloo · 2024-04-16T11:43:13Z

This is a full package review, following the process documented in https://epiverse-trace.github.io/blueprints/code-review.html#full-package-review.

It allows us to use commit suggestions, and annotate the code directly by hijacking GitHub PR review infrastructure.

Once this is done, this PR can be updated to target main to directly integrate the changes done it to main.

Bisaloo

Overall, I think this package has much improved in 2.0.0. In particular, despite my comment on this later, I think the "happy path" is getting clearer and clearer, which is definitely the right direction.

I've left numerous comments but would recommend focus on changes impacting the user interface, so that breaking changes are grouped as much as possible in 2.0.0. Internal changes can wait the next minor version.

Package Review

Version tested: aed70b8

Documentation

The package includes all the following forms of documentation:

☒ A statement of need: clearly stating problems the software is designed to solve and its target audience in README
☒ Installation instructions: for the development version of package and any non-standard dependencies in README
☒ Vignette(s): demonstrating major functionality that runs successfully locally
☒ Function Documentation: for all exported functions
☒ Examples: (that run successfully locally) for all exported functions
☒ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

☒ Installation: Installation succeeds as documented.
☒ Functionality: Any functional claims of the software have been confirmed.
☒ Performance: Any performance claims of the software have been confirmed.
☒ Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
☒ Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 6

Review Comments

Short-term

Class structure

I see it has been discussed a bit in Discussion: should we require users to call as_forecast() before score()? #507 but I am uncomfortable with the fact we re-validate the class in the print() method. Not just because it’s inefficient and non-standard, but mostly because it may be the symptom of a deeper issue. Once it has been created, we should ensure (to the best of our ability) that it remains valid no matter what. And the moment it stopps being valid, we should throw an error immediately, not wait for the print() method to be called.

This is an inherent weakness of S3 classes, which are loosely defined, but we can add safeguards. This could take the form of a custom [ method which ensures crucial columns are never dropped.
I wonder if there should be a forecast abstract class, from which forecast_binary, forecast_point, etc. would inherit. In the docs and exported functions (e.g., is_forecast(), validate_forecast()), things appear as if there is such as class, and the actual class forecast_binary, forecast_point, etc. as not so visible.
On an unrelated note, should functions such as sample_to_quantile() be converted to use the class infrastructure? They would become something like as.forecast_sample.forecast_quantile()

NAMESPACE

I believe scoringutils NAMESPACE is much larger than what it should be, which makes it difficult to approach and maintain. We probably need to have a longer discussion about this but some early thoughts: - should the low-level functions really be exported? There is no workflow in the vignettes presenting when it would make sense to use them directly. - functions such as run_safely() should definitely not be exported IMO since they don’t fit with scoringutils core functionality. Instead, we should encourage users to use the tools they are already familiar for this kind of work beyond scoringutils scope, such as tryCatch() or purrr::safely(). - assert_forecast_generic() is marked as internal, but also exported

git repository

Am I correct that most figures in man/figures/ are no longer used? If so, they should be removed.

Middle-term

Happy path

I was unsure if this should be a short-term or middle-term issue. My main concern with putting it in middle-term is that it will result in breaking changes, which always are a pain, both for users and developers. At the same time, they may require some user feedback, and will likely require some time to implement, so maybe it shouldn’t delay the 2.0.0 release. Some of it has been discussed in the NAMESPACE section but this section mentions other situations as well.

In the same line of thought as #507, it is probably good to have a clearer “happy path” / “recommended workflow”. Options that differ from this path should be removed in many cases to keep the user on the happy path, reduce information overload, and facilitate maintenance. In general, there should not be two nearly identical ways to perform the same task.

For example, in summarize_score(), the by and across arguments are redundant, which is confusing for the user, and forces the developers to jump through extra input checking hoops. Since the transformation of one to the other, I would recommend being more opinionated, and having a single, well-documented option.

Suppressing output

scoringutils codebase contains a lot of try(), suppressMessages(), suppressWarnings(). This is likely the symptoms that a refactoring or redesign is necessary. In particular, one potential cause can be the lack of delineation between internal and external functions, as discussed in the earlier NAMESPACE section.

try(), suppressMessages(), suppressWarnings() should usually be a last resort, and not a common pattern in the codebase. Since they affect all conditions, not just conditions created by scoringutils, they can also hide deeper issues, and make debugging harder.

Long-term

Scaffolding

The Metrics R package doesn’t seem actively maintained and may present a risk to the long-term sustainability of the project. From what I can tell, we only use it for relatively straightforward operations so I wonder if it’s really worth depending on it. A potential strong argument for having such a dependency is if it provided a significant speed boost by using C/C++ or Rust, especially for operations used in loops. But Metrics uses pure R so provides limited value-add in our case.
Conversely, I would probably recommend depending on the MatrixStats package. It’s a well-established package by an active member of the R community. It simple, clearly-named, and more efficient (written in C) replacement for the various apply() calls throughout the codebase.

Plots

I believe I mentioned it in the past but I am still uncomfortable with the plot_heatmap() function. The output here should be a table, not a plot. This comes with accessibility issues, as the values are no longer accessible to screen readers, and it’s harder to copy-paste the values. There are good options to have tables with coloured cells today, and it should probably be the way to go.

.lintr

DESCRIPTION

R/check-input-helpers.R

R/summarise_scores.R

Bisaloo · 2024-04-18T08:42:30Z

R/summarise_scores.R

+#' @param by Character vector with column names to summarise scores by. Default
+#'   is `model`, meaning that there will be one score per model in the output.
+#' @param across Character vector with column names to summarise scores
+#'   across (meaning that the specified columns will be dropped). This is an
+#'   alternative to specifying `by` directly. If `across` is set, `by` will be
+#'   ignored. If `across` is `NULL` (default), then `by` will be used.


Should we be more opinionated here? See the section "Happy path" in my general review comment

I think so but @nikosbsse I know finnds this very useful as a user so hard to say

@sbfnk what do you think?

I'm all for opinionated paths

vignettes/scoringutils.Rmd

seabbs · 2024-04-18T15:35:05Z

Nice still digesting but

On an unrelated note, should functions such as sample_to_quantile() be converted to use the class infrastructure? They would become something like as.forecast_sample.forecast_quantile()

I have a PR for you #790

nikosbosse · 2024-04-18T15:47:11Z

This is great, thanks so much!

seabbs · 2024-04-18T16:04:13Z

I wonder if there should be a forecast abstract class, from which forecast_binary, forecast_point, etc. would inherit. In the docs and exported functions (e.g., is_forecast(), validate_forecast()), things appear as if there is such as class, and the actual class forecast_binary, forecast_point, etc. as not so visible.

Yes I agree.

Co-authored-by: Hugo Gruson <[email protected]>

nikosbosse · 2024-05-05T13:23:52Z

Summary of action items based on the discussions (in addition to what's mentioned in the general review comments:

decide on whether to use purrr::partial vs. customise_metric()
decide on whether to keep the digits argument in get_correlations()
decide on whether to keep the across argument in summarise_scores()
create a new forecast parent class (we maybe could also do this in 2.1. - not sure it would break existing code)
find a way to warn users immediately once a forecast object stops becoming valid. @Bisaloo suggested updating [ to achieve that.

* correct orcid statements in DESCRIPTION * Automatic readme update [ci skip] * update docs * revert suggested change * update test * make import explicit * update import * delete unnecessary comment --------- Co-authored-by: GitHub Action <[email protected]>

nikosbosse · 2024-05-19T08:45:14Z

uhm... I tried splitting up the already accepted changes into a new PR. But when I merged this, github was smart enough to see that the latest commit was the same as this one so this PR got "merged" as well and is closed now. Sorry!
I'll split up the open points into different issues

nikosbosse · 2024-05-19T09:39:05Z

@Bisaloo, you mentioned this:

I was unsure if this should be a short-term or middle-term issue. My main concern with putting it in middle-term is that it will result in breaking changes, which always are a pain, both for users and developers. At the same time, they may require some user feedback, and will likely require some time to implement, so maybe it shouldn’t delay the 2.0.0 release. Some of it has been discussed in the NAMESPACE section but this section mentions other situations as well.

In the same line of thought as #507, it is probably good to have a clearer “happy path” / “recommended workflow”. Options that differ from this path should be removed in many cases to keep the user on the happy path, reduce information overload, and facilitate maintenance. In general, there should not be two nearly identical ways to perform the same task.

For example, in summarize_score(), the by and across arguments are redundant, which is confusing for the user, and forces the developers to jump through extra input checking hoops. Since the transformation of one to the other, I would recommend being more opinionated, and having a single, well-documented option.

I created an issue for across here: #822. Are there any other specific changes you recommend to make the happy path clearer?

Bisaloo commented Apr 18, 2024

View reviewed changes

seabbs and others added 8 commits April 18, 2024 17:04

Update R/summarise_scores.R

945e95b

Co-authored-by: Hugo Gruson <[email protected]>

Update R/check-input-helpers.R

b260409

Co-authored-by: Hugo Gruson <[email protected]>

Update R/default-scoring-rules.R

7f65176

Co-authored-by: Hugo Gruson <[email protected]>

Update R/check-input-helpers.R

5e56a0c

Co-authored-by: Hugo Gruson <[email protected]>

Update DESCRIPTION

4d69b2b

Co-authored-by: Hugo Gruson <[email protected]>

Update R/check-inputs-scoring-functions.R

fd89842

Co-authored-by: Hugo Gruson <[email protected]>

Update R/convenience-functions.R

4129e2f

Co-authored-by: Hugo Gruson <[email protected]>

Update R/print.R

980a14d

Co-authored-by: Hugo Gruson <[email protected]>

nikosbosse changed the base branch from empty to main May 5, 2024 12:00

nikosbosse mentioned this pull request May 18, 2024

Issue #817 - Create forceast super class #813

Merged

9 tasks

Merge branch 'main' into review

e8bd1fc

This was referenced May 19, 2024

Implement ways to immediately warn when a forecast object becomes invalid #816

Closed

Create abstract forecast class #817

Closed

Reduce package NAMESPACE #818

Closed

Accept updated snapshot

9d28462

nikosbosse mentioned this pull request May 19, 2024

Partially merge changes suggested by code review in #791 #819

Merged

9 tasks

nikosbosse merged commit fbfbe56 into main May 19, 2024
18 checks passed

nikosbosse deleted the review branch May 19, 2024 08:40

This was referenced May 19, 2024

Package scaffolding #825

Open

Update plot_heatmap() to be a table instead of a plot #826

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review scoringutils 2.0.0 #791

Review scoringutils 2.0.0 #791

Bisaloo commented Apr 16, 2024

Bisaloo left a comment

Bisaloo Apr 18, 2024

seabbs Apr 18, 2024

nikosbosse May 5, 2024

sbfnk May 17, 2024

seabbs commented Apr 18, 2024 •

edited

Loading

nikosbosse commented Apr 18, 2024

seabbs commented Apr 18, 2024

nikosbosse commented May 5, 2024

nikosbosse commented May 19, 2024

nikosbosse commented May 19, 2024

Review scoringutils 2.0.0 #791

Review scoringutils 2.0.0 #791

Conversation

Bisaloo commented Apr 16, 2024

Bisaloo left a comment

Choose a reason for hiding this comment

Package Review

Documentation

Functionality

Review Comments

Short-term

Class structure

NAMESPACE

git repository

Middle-term

Happy path

Suppressing output

Long-term

Scaffolding

Plots

Bisaloo Apr 18, 2024

Choose a reason for hiding this comment

seabbs Apr 18, 2024

Choose a reason for hiding this comment

nikosbosse May 5, 2024

Choose a reason for hiding this comment

sbfnk May 17, 2024

Choose a reason for hiding this comment

seabbs commented Apr 18, 2024 • edited Loading

nikosbosse commented Apr 18, 2024

seabbs commented Apr 18, 2024

nikosbosse commented May 5, 2024

nikosbosse commented May 19, 2024

nikosbosse commented May 19, 2024

seabbs commented Apr 18, 2024 •

edited

Loading