Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review scoringutils 2.0.0 #791

Merged
merged 11 commits into from
May 19, 2024
Merged

Review scoringutils 2.0.0 #791

merged 11 commits into from
May 19, 2024

Conversation

Bisaloo
Copy link
Member

@Bisaloo Bisaloo commented Apr 16, 2024

This is a full package review, following the process documented in https://epiverse-trace.github.io/blueprints/code-review.html#full-package-review.

It allows us to use commit suggestions, and annotate the code directly by hijacking GitHub PR review infrastructure.

Once this is done, this PR can be updated to target main to directly integrate the changes done it to main.

Copy link
Member Author

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this package has much improved in 2.0.0. In particular, despite my comment on this later, I think the "happy path" is getting clearer and clearer, which is definitely the right direction.

I've left numerous comments but would recommend focus on changes impacting the user interface, so that breaking changes are grouped as much as possible in 2.0.0. Internal changes can wait the next minor version.

Package Review

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.

  • Functionality: Any functional claims of the software have been confirmed.

  • Performance: Any performance claims of the software have been confirmed.

  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 6


Review Comments

Short-term

Class structure
  • I see it has been discussed a bit in Discussion: should we require users to call as_forecast() before score()? #507 but I am uncomfortable with the fact we re-validate the class in the print() method. Not just because it’s inefficient and non-standard, but mostly because it may be the symptom of a deeper issue. Once it has been created, we should ensure (to the best of our ability) that it remains valid no matter what. And the moment it stopps being valid, we should throw an error immediately, not wait for the print() method to be called.

    This is an inherent weakness of S3 classes, which are loosely defined, but we can add safeguards. This could take the form of a custom [ method which ensures crucial columns are never dropped.

  • I wonder if there should be a forecast abstract class, from which forecast_binary, forecast_point, etc. would inherit. In the docs and exported functions (e.g., is_forecast(), validate_forecast()), things appear as if there is such as class, and the actual class forecast_binary, forecast_point, etc. as not so visible.

  • On an unrelated note, should functions such as sample_to_quantile() be converted to use the class infrastructure? They would become something like as.forecast_sample.forecast_quantile()

NAMESPACE

I believe scoringutils NAMESPACE is much larger than what it should be, which makes it difficult to approach and maintain. We probably need to have a longer discussion about this but some early thoughts: - should the low-level functions really be exported? There is no workflow in the vignettes presenting when it would make sense to use them directly. - functions such as run_safely() should definitely not be exported IMO since they don’t fit with scoringutils core functionality. Instead, we should encourage users to use the tools they are already familiar for this kind of work beyond scoringutils scope, such as tryCatch() or purrr::safely(). - assert_forecast_generic() is marked as internal, but also exported

git repository

Am I correct that most figures in man/figures/ are no longer used? If so, they should be removed.

Middle-term

Happy path

I was unsure if this should be a short-term or middle-term issue. My main concern with putting it in middle-term is that it will result in breaking changes, which always are a pain, both for users and developers. At the same time, they may require some user feedback, and will likely require some time to implement, so maybe it shouldn’t delay the 2.0.0 release. Some of it has been discussed in the NAMESPACE section but this section mentions other situations as well.

In the same line of thought as #507, it is probably good to have a clearer “happy path” / “recommended workflow”. Options that differ from this path should be removed in many cases to keep the user on the happy path, reduce information overload, and facilitate maintenance. In general, there should not be two nearly identical ways to perform the same task.

For example, in summarize_score(), the by and across arguments are redundant, which is confusing for the user, and forces the developers to jump through extra input checking hoops. Since the transformation of one to the other, I would recommend being more opinionated, and having a single, well-documented option.

Suppressing output

scoringutils codebase contains a lot of try(), suppressMessages(), suppressWarnings(). This is likely the symptoms that a refactoring or redesign is necessary. In particular, one potential cause can be the lack of delineation between internal and external functions, as discussed in the earlier NAMESPACE section.

try(), suppressMessages(), suppressWarnings() should usually be a last resort, and not a common pattern in the codebase. Since they affect all conditions, not just conditions created by scoringutils, they can also hide deeper issues, and make debugging harder.

Long-term

Scaffolding
  • The Metrics R package doesn’t seem actively maintained and may present a risk to the long-term sustainability of the project. From what I can tell, we only use it for relatively straightforward operations so I wonder if it’s really worth depending on it. A potential strong argument for having such a dependency is if it provided a significant speed boost by using C/C++ or Rust, especially for operations used in loops. But Metrics uses pure R so provides limited value-add in our case.
  • Conversely, I would probably recommend depending on the MatrixStats package. It’s a well-established package by an active member of the R community. It simple, clearly-named, and more efficient (written in C) replacement for the various apply() calls throughout the codebase.
Plots

I believe I mentioned it in the past but I am still uncomfortable with the plot_heatmap() function. The output here should be a table, not a plot. This comes with accessibility issues, as the values are no longer accessible to screen readers, and it’s harder to copy-paste the values. There are good options to have tables with coloured cells today, and it should probably be the way to go.

.lintr Outdated Show resolved Hide resolved
DESCRIPTION Outdated Show resolved Hide resolved
DESCRIPTION Outdated Show resolved Hide resolved
R/check-input-helpers.R Outdated Show resolved Hide resolved
R/check-input-helpers.R Outdated Show resolved Hide resolved
R/summarise_scores.R Outdated Show resolved Hide resolved
R/summarise_scores.R Outdated Show resolved Hide resolved
Comment on lines 16 to 21
#' @param by Character vector with column names to summarise scores by. Default
#' is `model`, meaning that there will be one score per model in the output.
#' @param across Character vector with column names to summarise scores
#' across (meaning that the specified columns will be dropped). This is an
#' alternative to specifying `by` directly. If `across` is set, `by` will be
#' ignored. If `across` is `NULL` (default), then `by` will be used.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be more opinionated here? See the section "Happy path" in my general review comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so but @nikosbsse I know finnds this very useful as a user so hard to say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbfnk what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for opinionated paths

vignettes/scoringutils.Rmd Outdated Show resolved Hide resolved
vignettes/scoringutils.Rmd Outdated Show resolved Hide resolved
@seabbs
Copy link
Contributor

seabbs commented Apr 18, 2024

Nice still digesting but

On an unrelated note, should functions such as sample_to_quantile() be converted to use the class infrastructure? They would become something like as.forecast_sample.forecast_quantile()

I have a PR for you #790

@nikosbosse
Copy link
Contributor

This is great, thanks so much!

@seabbs
Copy link
Contributor

seabbs commented Apr 18, 2024

I wonder if there should be a forecast abstract class, from which forecast_binary, forecast_point, etc. would inherit. In the docs and exported functions (e.g., is_forecast(), validate_forecast()), things appear as if there is such as class, and the actual class forecast_binary, forecast_point, etc. as not so visible.

Yes I agree.

@nikosbosse
Copy link
Contributor

Summary of action items based on the discussions (in addition to what's mentioned in the general review comments:

  • decide on whether to use purrr::partial vs. customise_metric()
  • decide on whether to keep the digits argument in get_correlations()
  • decide on whether to keep the across argument in summarise_scores()
  • create a new forecast parent class (we maybe could also do this in 2.1. - not sure it would break existing code)
  • find a way to warn users immediately once a forecast object stops becoming valid. @Bisaloo suggested updating [ to achieve that.

* correct orcid statements in DESCRIPTION

* Automatic readme update [ci skip]

* update docs

* revert suggested change

* update test

* make import explicit

* update import

* delete unnecessary comment

---------

Co-authored-by: GitHub Action <[email protected]>
@nikosbosse nikosbosse merged commit fbfbe56 into main May 19, 2024
18 checks passed
@nikosbosse nikosbosse deleted the review branch May 19, 2024 08:40
@nikosbosse
Copy link
Contributor

uhm... I tried splitting up the already accepted changes into a new PR. But when I merged this, github was smart enough to see that the latest commit was the same as this one so this PR got "merged" as well and is closed now. Sorry!
I'll split up the open points into different issues

@nikosbosse
Copy link
Contributor

@Bisaloo, you mentioned this:

I was unsure if this should be a short-term or middle-term issue. My main concern with putting it in middle-term is that it will result in breaking changes, which always are a pain, both for users and developers. At the same time, they may require some user feedback, and will likely require some time to implement, so maybe it shouldn’t delay the 2.0.0 release. Some of it has been discussed in the NAMESPACE section but this section mentions other situations as well.

In the same line of thought as #507, it is probably good to have a clearer “happy path” / “recommended workflow”. Options that differ from this path should be removed in many cases to keep the user on the happy path, reduce information overload, and facilitate maintenance. In general, there should not be two nearly identical ways to perform the same task.

For example, in summarize_score(), the by and across arguments are redundant, which is confusing for the user, and forces the developers to jump through extra input checking hoops. Since the transformation of one to the other, I would recommend being more opinionated, and having a single, well-documented option.

I created an issue for across here: #822. Are there any other specific changes you recommend to make the happy path clearer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants