Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: should we require users to call as_forecast() before score()? #507

Closed
nikosbosse opened this issue Nov 27, 2023 · 18 comments · Fixed by #674
Closed

Discussion: should we require users to call as_forecast() before score()? #507

nikosbosse opened this issue Nov 27, 2023 · 18 comments · Fixed by #674
Labels
Breaking change This issue discusses or suggests a potentially breaking change

Comments

@nikosbosse
Copy link
Contributor

nikosbosse commented Nov 27, 2023

At the moment we allow users to do something like

score(example_quantile)

which internally dispatches to score.default() which calls as_forecast() and then calls score() again, dispatching to the correct method.

Image

Alternatively, we could just force users to call as_forecast() before score(). This would be cleaner, but less convenient. What do you think?

@seabbs mentioned this in the following issue: #399

@nikosbosse nikosbosse converted this from a draft issue Nov 27, 2023
@nikosbosse
Copy link
Contributor Author

I personally like the extra convenience.

@seabbs
Copy link
Contributor

seabbs commented Nov 27, 2023

The upside of forcing conversion prior to using score is it will be clear why things are failing (and encourage checking) and that we can have default arguments for score such as metrics (or rules in the refactor) that are standard across score.functions which we currently can't (and so have to do via ...).

It would also reduce code complexity.

The downside is as @nikosbosse says its a slight reduction in convenience.

@sbfnk
Copy link
Contributor

sbfnk commented Nov 27, 2023

Could the generic not have these arguments? It may be a minefield but I thought it was possible.

@nikosbosse nikosbosse added the Breaking change This issue discusses or suggests a potentially breaking change label Nov 29, 2023
@nikosbosse
Copy link
Contributor Author

nikosbosse commented Nov 29, 2023

Copied from #399:

We can't easily have metrics an argument of score(): Either metrics doesn't get passed down to the actual method, as in this case:

score.default <- function(data, metrics, ...) {
  data <- validate(data)
  score(data, ...)
}

or metrics can't be empty and it will error if it's missing, as in this case:

score.default <- function(data, metrics, ...) {
  data <- validate(data)
  score(data, metrics, ...)
}

The latter case is problematic because score() doesn't have a single default metrics, rather that default sits within the methods.

We can still have the description be "an optional list", but I'm not sure adding metrics as an argument to score() works very well / we'd have to make it work in a way that I'm not sure adds that much

@seabbs
Copy link
Contributor

seabbs commented Nov 29, 2023

The lazy solution to this would be to have none of them have a default but instead detect when metrics is missing and then apply a default in the function. Its lazy and messy but does provide another option.

@nikosbosse nikosbosse added this to the scoringutils 2.0.0 milestone Nov 29, 2023
@sbfnk
Copy link
Contributor

sbfnk commented Nov 29, 2023

This might be dodgy and I suspect having different default arguments for different S3 methods could create trouble I'm not foreseeing but you could manipulate .Class directly and use method dispatch.

score <- function(x, metrics, ...) {
  UseMethod("score")
}

score.default <- function(x, metrics, ...) {
  if (x %in% c(0L, 1L)) {
    .Class <- "binary"
  } else if (x %% 1 == 0) {
    .Class <- "discrete"
  } else {
    .Class<- "continuous"
  }
  NextMethod()
}

score.binary <- function(x, metrics = "brier", ...) {
  message("I'm binary and my metrics are ", metrics)
}

score.discrete <- function(x, metrics = "RPS", ...) {
  message("I'm discrete and my metrics are ", metrics)
}

score.continuous <- function(x, metrics = "CRPS", ...) {
  message("I'm continuous and my metrics are ", metrics)
}

score(0L)
#> I'm binary and my metrics are brier
score(7L)
#> I'm discrete and my metrics are RPS
score(3.5)
#> I'm continuous and my metrics are CRPS

Created on 2023-11-29 with reprex v2.0.2

@seabbs
Copy link
Contributor

seabbs commented Nov 29, 2023

yup I think this is a nice solution but probably want to keep the internal conversion from as_forecast if doing this and then use the class from that to update .Class.

@nikosbosse see here: http://adv-r.had.co.nz/S3.html

@seabbs
Copy link
Contributor

seabbs commented Feb 28, 2024

Where are we on this? Do we have a decision?

@nikosbosse
Copy link
Contributor Author

nikosbosse commented Feb 29, 2024

To me, it seems like the discussion in this issue is dealing with two separate (albeit related) questions.

1) Should the generic be score(data, ...) (current) or score(data, metrics, ...) (proposal)

@sbfnk pointed out a way how to do this regardless of our decision on question 2). If we want this (and decide not to force as_forecast() before score() then it should likely be its own issue and I'm happy to open one. (if we decide to force as_forecast() before score() then we can just get rid of score.default() and don't need @sbfnk's suggestion anymore).

2) Should we require users to call as_forecast() before other functions.

The following functions currently call as_forecast() internally:

  • score()
  • get_forecast_counts()
  • get_coverage()
  • get_pit()
  • transform_forecasts()
    (see current package workflow below)

Should we require users to call as_forecast() beforehand?

Pros:

  • we could get rid of the score.default() function and the current hack with score.default() --> as_forecast() --> score() (or the alternative hack proposed by @sbfnk.
  • enforcing a clear workflow
  • The workflow becomes modular of 1. Check, fix, and check input, 2. use downstream functions with no further iterative checking required and no unapproved repeated warning messages.
  • (we don't have to explain to users anymore that those functions call as_forecast() internally)

Cons:

  • users need to do extra typing (or roll their own wrapper)
  • all functions likely still need to call validate_forecast() to validate the inputs in case someone created a forecast object and messed it up.

In both cases, I suggest that the functions listed above call validate_forecast() silently iff the object has already been validated (see #545). The validation should error if something is wrong, but I don't think we need to repeat messages/warnings(?) on an already validated object.

image

@nikosbosse
Copy link
Contributor Author

nikosbosse commented Feb 29, 2024

Maybe we should get an initial picture via a vote on question 2) in the comment above.

Pinging a few devs and users @seabbs @sbfnk @kathsherratt @toshiakiasakura @jhellewell14 @nickreich @Bisaloo @bsweger @dylanhmorris @jonathonmellor @OvertonC

  • React with 👍 if you're in favour of requiring users to call as_forecast() explicitly
  • React with 👎 if you're in favour of not enforcing as_forecast() to be called prior to using other functions (i.e supporting multiple entry points).

@seabbs
Copy link
Contributor

seabbs commented Feb 29, 2024

The validation should error if something is wrong, but I don't think we need to repeat messages/warnings(?) on an already validated object.

Agree this is a nice feature of having as_forecast as the external interface and validate_forecast as internal. You can differentiate between something is wrong and we have already told you about this but you have ignored us (and so not endlessly emit warnings).

@nikosbosse I made some light edits to your outline to slightly shift the bias ;)

@Bisaloo
Copy link
Member

Bisaloo commented Mar 1, 2024

I'm on the fence 🤔.

As an R package maintainer, I would vote for requiring an explicit call to as_forecast() since it's much less "magic", and make sure users follow the happy path we have built for them, while having full understanding & control of what's happening.

As a potential user, I have been annoyed by overly strict requirements in terms of object class and format in the past. I'm not completely sure we would be in this scenario here since calling as_forecast() seems like a small and clear enough step. But I'm still cautious of unflexible pipelines.

If I may bring another viewpoint to the table that may not be terribly helpful here since it doesn't bring an immediate solution to this specific question:
I would see value of requiring early conversion to a custom S3 object if it brings more than one benefit with it. Saying "it makes the score() workflow cleaner" is not enough to fully convince me to pull the switch at this time. But if we start adding other arguments, such as "we can have a custom plot() method" (probably in a different package though), or any other separate benefit, then the choice becomes much simpler IMO.

@seabbs
Copy link
Contributor

seabbs commented Mar 1, 2024

There is only currently a custom print method (which is helpful for checking) and the custom get_ methods that also need a score object). I think you are right that could imagine another package that plots forecast objects.

My main worry is that for new users the current workflow doesn't really confront them with what kind of forecast they have so they may get a false sense of security when chucking their forecasts into score (and also not take the time to use the get_ functions or explore non-default metric options).

@dylanhmorris
Copy link

We use scoringutils in my team at CFA. I tend to prefer slightly longer but more explicit workflows, so I've voted for having the user call as_forecast(). I agree with @seabbs above RE: making it easier for (and encouraging) the user to do basic checks.

@kathsherratt
Copy link

kathsherratt commented Mar 1, 2024

Adding another vote for explicitly calling as_forecast() 🗳️
I agree with @seabbs and @dylanhmorris on longer workflows (that call me out on the data I'm chucking in), and prefer getting error messages step-by-step. @Bisaloo I do agree with not liking inflexible workflows but then again if it has to be inflexible as a user I'd rather that were made explicit.

@seabbs seabbs changed the title Discussion: should we force users to call as_forecast() before score()? Discussion: should we require users to call as_forecast() before score()? Mar 1, 2024
@jonathonmellor
Copy link

Interesting discussion! Have voted for upfront as_forecast. While a workflow change is always an inconvenience for the existing user, if it 1) improves future functionality and 2) makes what's expected more explicit then that's a good thing. For a new user who will likely look at the quick start docs this change won't be noticed. If the function is expecting a certain class for the function to work, coercion to that class up front makes sense.

When we onboard colleagues in our team to using scoringutils, the sticking point is often the data formatting required (rightly, I'll add), anything that supports that / makes it clearer is a +1

@nikosbosse
Copy link
Contributor Author

Thank you all for your input, this was really helpful!
Then we'll enforce as_forecast() before score() in the next version of scoringutils. I'm assuming this also implies that we enforce as_forecast() before calling

  • get_forecast_counts()
  • get_coverage()
  • get_pit()
  • and transform_forecasts()

Do you agree with that?

@toshiakiasakura
Copy link
Collaborator

I agree because socringutils currently supports the renaming and validation arguments in as_forecast(). I am afraid that the wrong results will be returned due to different argument specifications. For example, one uses as_forecast(sample_id="id") before score() function, but when they use get_forecast_counts() with the original data before conversion and as_forecast() without arguments is applied in get_forecast_counts() implicitly, this might produce unexpected results.

nikosbosse added a commit that referenced this issue Mar 6, 2024
Issue #507: Require users to call `as_forecast()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking change This issue discusses or suggests a potentially breaking change
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants