Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our API suggests that panicking should be the default #14275

Open
janhohenheim opened this issue Jul 11, 2024 · 40 comments
Open

Our API suggests that panicking should be the default #14275

janhohenheim opened this issue Jul 11, 2024 · 40 comments
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible D-Complex Quite challenging from either a design or technical perspective. Ask for help! M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide S-Needs-SME Decision or review from an SME is required X-Controversial There is active debate or serious implications around merging this PR
Milestone

Comments

@janhohenheim
Copy link
Member

janhohenheim commented Jul 11, 2024

In spirit of #12660
Adapted from #14268

What problem does this solve or what need does it fill?

I believe it is not controversial to say that when an API offers two similar functions with similar names, the shorter will seem like the default. As such, I believe many people will instinctively gravitate to single and single_mut over get_single and get_single_mut. This means we are subtly pushing users to prefer the implicitly panicking version over the fallible one. This is bad, as it leads to games that may run well on the developers machine but then panic in an edge case.
It is currently understood in the wider Rust community that panics should be an exceptional case, a nuclear option for when recovery is impossible.
We should do our best to make it as easy to do the right thing by default (See Falling Into The Pit of Success). This means that it should be easier to handle an error by propagating it with ? to a logger or similar than to panic. Our current API does the opposite.

This is an issue for the following APIs:

What solution would you like?

Deprecate and remove panicking variants. Improve ergonomics by adding macros for early returns. Using a macro for get_single as an example:

fn do_stuff(primary_window: Query<&Window, With<PrimaryWindow>>) {
    let window = get_single!(primary_window);
    // Do stuff with the window. If there's not exactly one, we have already returned.
}

Which expands to:

fn do_stuff(primary_window: Query<&Window, With<PrimaryWindow>>) {
    match primary_window.get_single() {
        Ok(item) => item,
        Err => return default(),
    };
    // Do stuff with the window. If there's not exactly one, we have already returned.
}

Note that returning default() will allow the macro to work with systems that return an Option and get piped into error handling methods.

Similar macros are already being used by some users, as indicated by https://github.com/tbillington/bevy_best_practices?tab=readme-ov-file#getter-macros

Paraphrasing @alice-i-cecile:

I'm also coming around on the macro approach: I'd rather this be a Rust feature where we can just ? in function that return (), but failing that this is better than the endless let else return pattern.

As the engine matures, I'm increasingly against panicking APIs, especially in seemingly innocuous cases like this. While it's nice for prototyping, it's a serious hazard for refactors and production-grade apps.

We should decide on what to do for all of these areas at the same time and make a consistent decision across the board, pulling in both SME-ECS and @cart. I think they're better implemented as separate PRs to avoid extreme size, but they should be shipped in the same cycle if we make the change.

What alternative(s) have you considered?

  • Do nothing
  • Leave the API as-is and heavily recommend the get_ variants in documentation
  • Print a warning when calling panicking variants
  • Keep the panicking variants and rename them to something clunky like _unchecked
  • Make the panicking API opt-in via a feature
  • Leave the API as-is and let the schedule handle panics
  • Use a #[system] macro that modifies the code to allow us to use ? in it, like in bevy_mod_sysfail

Open Questions

Naming

There is a sentiment that get_ is used by the standard library in general when returning an Option.
Quoting @benfrankel:

This isn't true for the standard library: Vec::first, Iterator::next, Path::parent, Error::source, etc.

The get_ prefix is only used to differentiate from a panicking version. If there's no panicking version, there's no need for the get_ prefix. For some reason Bevy has latched onto foo vs get_foo as the only API choice when foo may fail, but it doesn't have to be this way.

In a world where Bevy is not panicking by default and always hands out Options and Results, I believe there is little reason to stick to a get_ prefix. It is unnecessary noise on otherwise very concise functions. As such, I would advise to change Bevy's naming convention and drop it as part of this initiative. This may sound like something for another issue, but I'll argue that it should be discussed at least in the same release cycle. Users will already have to go and touch every instance of mut, single, etc. so I don't want to further annoy them by having to change them yet a second time when we drop the get_ prefix.

I realize that this is more controversial however and I'm fine with leaving it. I just want to point out that this is probably the best chance we will get to change it.

Macro parameters

It would be nice to have some influence over what should be done in the unhappy path. For example, we could have a parameter that controls an error! log:

get_single!(foo, "Oh heck");

which expands to:

match foo.get_single() {
    Ok(item) => item,
    Err => {
        error!("Oh heck");
        return default();
    }
};

Of course, the use of error! is arbitrary here and the user may want to use warn! instead or something else altogether. To accommodate for this, we could pass a closure:

get_single!(foo, || error!("Oh heck"));

Which expands to the same as above. Note that this closure could even dictate what we return. Since error! returns (), this works out for the common case, and a user is free to build their own Error variant. However at that point, I don't know if a macro call is saving much typing work anymore.

One could also combine these variants by using labeled parameters:

get_single!(foo, warn: "Oh heck");
get_single!(bar, error: "Oh heck");
get_single!(baz, returns: || error!("Oh heck"));

This would allow the user to have a terse version for the common case usage and a more verbose one for custom error handling.

continue

This discussion has focused on the case where we want an early return. I think that is the common case, at least in my experience. What about continue however? Do we want a _continue variant for all macros? An extra parameter? Is there some kind of Rust hack that turns into either continue or return depending on context? Would that even be what the user expects to happen?

@janhohenheim janhohenheim added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide D-Complex Quite challenging from either a design or technical perspective. Ask for help! X-Controversial There is active debate or serious implications around merging this PR S-Needs-SME Decision or review from an SME is required labels Jul 11, 2024
@janhohenheim
Copy link
Member Author

Pinging SME-ECS: @JoJoJet @maniwani

@janhohenheim janhohenheim changed the title Our API suggests to the user that panicking should be the default Our API suggests that panicking should be the default Jul 11, 2024
@JoJoJet
Copy link
Member

JoJoJet commented Jul 11, 2024

I agree with this I think, however it would be unfortunate for singleton-style components to become more verbose. single is one of those panicking methods that kind of makes sense, since if you use it with a component type that you only spawn one of it's always correct.
Maybe we could remove panicking-by-default and still avoid the verbosity by having a marker trait for singleton components, which would unlock the infallible accessor for that type

@janhohenheim
Copy link
Member Author

janhohenheim commented Jul 11, 2024

@JoJoJet you could also do the following, assuming we remove the get_ suffix and allow passing a function to the macro as a second param:

let singleton = single!(singleton); // early return
let singleton = single!(singleton, panic); // panic if there's not exactly one

This would require some panic util fn just like we have for warn, error, etc.

@benfrankel
Copy link
Contributor

benfrankel commented Jul 11, 2024

The RFC for postfix macros would make this pattern so much nicer. Waiting for a language feature isn't practical, though.

let window = window_query.single().or_return!("optional warning msg");

@IceSentry
Copy link
Contributor

IceSentry commented Jul 11, 2024

At work we started using a slightly different pattern. We use the early return with the fallible functions, but we use a debug_assert so it still panics in debug to help us catch those issues. That way it doesn't panic for our users if something unexpected happens.

@cart
Copy link
Member

cart commented Jul 11, 2024

My take is generally still what I discussed here: #12660 (comment)

Specifically, I think we should investigate the viability of the following plan:

  1. Make Result systems feel more automatic / first class (id like to stress that result systems are already possible, they just currently require an addition function call at registration). Ideally it is just as simple as returning Result in an existing system. If this isn't possible (while retaining support for non-result systems) for type-system reasons, investigate further.
    app.add_systems(Update, window_thing)
    
    fn window_thing(windows: Query<&Window>) -> Result {
      let window = windows.get_single()?;
      OK // this is a const export of Ok(())
    }
  2. Replace the majority of Option-returning functions with Results, which then enables ? to work seamlessly for pretty much everything:
    fn world_thing(world: &mut World) -> Result {
      let entity = world.get_entity(SOME_ENTITY)?;
      OK
    }
  3. Replace panicking nice-name api names (ex: entity()) with the Result-returning variant (and remove the old get_entity() function):
    fn world_thing(world: &mut World) -> Result {
      let entity = world.entity(SOME_ENTITY)?;
      OK
    }

Imo it is critical to prove steps (2) and (3) out first and do it "as a single transaction" from a user perspective. Skipping straight to step (3) (while easy) puts the cart before the horse and imo makes Bevy UX worse for a period of time (at best) and at worst, commits us to a path that may not be viable or ideal (due to unknowns not yet discovered). Skipping (2) would mean that ? isn't the "ergonomic one size fits all" solution it needs to be.

I do think the macro approach is an interesting "middle ground", but it feels like we're just re-implementing the ? behavior in a clunkier / less idiomatic / less flexible way. Rather than create a zoo of ways to solve this problem (and encourage people to migrate once to a middle ground solution, then again to the "final" solution), I think I'd prefer to just invest in a single canonical ideal solution.

That being said, if it doesn't already exist, I think it makes a lot of sense to build the macro approach as a 3rd party crate. That seems like an appropriate "quick no bureaucracy" middle ground solution while we sort out the details.

@cart
Copy link
Member

cart commented Jul 11, 2024

As a supplement, I think it would also be interesting to investigate an Option extension trait that lets you do this:

fn system() -> Result {
  let list = vec![1, 2];
  let x = list.get(0).unpack()?;
  let y = list.get(1).assume("index 1 should exist")?;
  OK
}

Where unpack() (non panicking unwrap variant) maps Option<T> to Result<T, ContextlessError>. And assume() (non panicking expect variant) maps to Result<T, MessageError>.

@benfrankel
Copy link
Contributor

unpack and assume seem very similar to Option::ok_or.

@cart
Copy link
Member

cart commented Jul 11, 2024

ok_or is different in that it forces you to select and provide an error type. ok_or() is not a replacement for the "screw it I just want a value" mindset of unwrap()

@cart
Copy link
Member

cart commented Jul 11, 2024

Ditto for y = list.ok_or("index 1 should exist")?;. If you omit the ? it will compile, but because &str does not directly implement Error, adding ? would cause a failure to compile in the context of an anyhow-style catch-all error. It would need to be something like list.ok_or(MessageError("index 1 should exist"))?, which defeats the "screw it" mindset.

@benfrankel
Copy link
Contributor

IMO better error handling shouldn't block a rename from e.g. single, get_single to single.unwrap, single. It's a footgun today, and it won't require any design work to solve.

@janhohenheim
Copy link
Member Author

janhohenheim commented Jul 11, 2024

Copying my sentiment from Discord here. The proposed additions to Option are great, but there is already anyhow::Context for exactly this.
anyhow is supremely mature and one of the most widely crates in Rust in general, so I think we should take advantage of this and just re-export it. We can also not re-export it and just tell users to use it.
But reinventing the wheel here seems rather pointless imo

This is probably food for thought for another issue however

@cart
Copy link
Member

cart commented Jul 11, 2024

(Also just mirroring discord) I do suspect that we may want our own top level general-purpose (anyhow-style catch-all) Bevy error type to allow us to better encapsulate Bevy context (such as system context). However should should definitely consider using anyhow given its maturity and featureset. Worth weighing both options. I agree that Context::context fills the same role as assume() (but not unpack()).

@alice-i-cecile
Copy link
Member

Replace the majority of Option-returning functions with Results

IMO we can and should start on this today: I've been frustrated by this in the past.

@cart
Copy link
Member

cart commented Jul 11, 2024

IMO we can and should start on this today: I've been frustrated by this in the past.

I still think we might want to land this all as a whole, as some people might be relying on the proliferation of Option to use ? in the context of code like fn something(world: &World) -> Option<Something> { world.get_entity(SOME_ENTITY)?; } . We risk breaking people multiple times / forcing them to stop using ? if we do this piecemeal.

@cart
Copy link
Member

cart commented Jul 11, 2024

(if you're saying we should start merging these changes now as they come in)
If you're just saying "lets break ground now", I'm fully on board :)

@JoJoJet
Copy link
Member

JoJoJet commented Jul 11, 2024

  1. Make Result systems feel more automatic / first class ... Ideally it is just as simple as returning Result in an existing system

Just noting that I previously explored this approach in #8705, but I decided to abandon it becuase it made the stack traces for errors useless, which was worse than just calling unwrap() where the error originates IMO

@alice-i-cecile
Copy link
Member

Yep, fine to wait on merging. We should at least get a list for now.

@cart
Copy link
Member

cart commented Jul 11, 2024

@JoJoJet its worth calling out that the anyhow impl supports this reasonably (using built in rust features), provided you enable the "backtrace" anyhow cargo feature (and run with the RUST_BACKTRACE=1 environment variable):

use anyhow::Result;
use thiserror::Error;

fn main() {
    let result = foo();
    println!("{:?}", result);
}

fn foo() -> Result<()> {
    bar()?;
    Ok(())
}

fn bar() -> Result<()> {
    let x = vec![1, 2];
    let y = x.get(2).ok_or(E1)?; // backtrace correctly points here
    Ok(())
}

#[derive(Error, Debug)]
#[error("hello")]
struct E1;

Prints:

Err(hello

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /home/cart/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.86/src/backtrace.rs:27:14
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1964:27
   2: rust_playground::x
             at ./src/main.rs:11:13
   3: rust_playground::main
             at ./src/main.rs:5:18
   4: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:159:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
   9: std::panicking::try
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
  10: std::panic::catch_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:48
  12: std::panicking::try::do_call
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
  13: std::panicking::try
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
  14: std::panic::catch_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
  15: std::rt::lang_start_internal
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:20
  16: std::rt::lang_start
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:158:17
  17: main
  18: <unknown>
  19: __libc_start_main
  20: _start)

@cart
Copy link
Member

cart commented Jul 11, 2024

We could also consider adding a bevy feature that sets the environment variable for people:

#[cfg(feature = "backtrace")]
std::env::set_var("RUST_BACKTRACE", "1")

Just verified that this works. Then if/when we add a "dev" meta-cargo-feature, we could add "backtrace" to it. Then when people do cargo run --features bevy/dev during development (which I think we'll want to recommend ultimately), it will "just work".

@JoJoJet
Copy link
Member

JoJoJet commented Jul 11, 2024

Ah good point. We could provide a dedicated SystemError type that captures a backtrace when you convert to it via ?. If this type is used for nothing other than system return types, it wouldn't need to implement std::error::Error, so we could provide a blanket From impl for all error types without breaking coherence

@janhohenheim
Copy link
Member Author

janhohenheim commented Jul 12, 2024

@JoJoJet I want to point out that that would again increase the amount of code we duplicate from anyhow.
Is there something Bevy-specific we need in our error type that would justify creating it over using anyhow::Error? I suppose Bevy functions could populate some kind of metadata field on the error, but I'm struggling to think of something useful to put there that would not already be handled by the log call printing the underlying error or manually downcasting an inner error.

@JoJoJet
Copy link
Member

JoJoJet commented Jul 12, 2024

I think it would be kind of weird for us to use anyhow::Error in our public APIs, and I don't think we should allow error types without a backtrace as I know from experience that it is a massive footgun to refactor all of your unwraps into a custom error type, only to later realize that all of your error messages are incredibly annoying to trace back to a line of code. I think we should guard users against this trap by forcing them to use an error type with a backtrace. It could just be anyhow::Error, but that would seem weird to me as opposed to using a bevy-specific error type tailor-made for this purpose. We could also just wrap anyhow internally to avoid duplication

@janhohenheim
Copy link
Member Author

Yeah, whatever the solution, usable backtraces are definitely a must-have.

@benfrankel
Copy link
Contributor

I published a crate for the macro approach: https://github.com/benfrankel/tiny_bail.

I used (an earlier version of) these macros during the game jam and they were exceptionally convenient.

@MiniaczQ
Copy link
Contributor

MiniaczQ commented Sep 17, 2024

I feel like this issue drifted from the original purpose and either needs a rename or split to more issues.

My few cents are that we can't take the ergonomic hit from unwrapping everything by hand.
Either we slightly tweak the existing API to make non-panicking methods easier to access, but still keep the panicking ones.
Or we go with the ? approach you guys are discussing, but there seems to be no progress on actually providing that.

The reason I have to write this is, this is causing problems just by existing.
Each new API now has to decide whether it should expose panicking methods or not.
At the bare minimum this issue needs to resolve how new features should expose their APIs, old or new convention, while the new approach gets developed and applied to previous features.
@alice-i-cecile

@janhohenheim
Copy link
Member Author

janhohenheim commented Sep 17, 2024

@MiniaczQ I think Cart's point here is that changing it now is really disruptive to users, and will be really disruptive again once we have better error handling.
I personally think this is enough of a footgun / API issue to warrant these consequences, but I see the point.

Or is your suggestion that we simply don't follow the old conventions for new API, but leave the old API intact?

The new error handling sounds like something a working group could develop, if enough people are willing to participate. But I think we have plenty of those currently running already, so I doubt that would be fruitful at this time.

@MiniaczQ
Copy link
Contributor

MiniaczQ commented Sep 17, 2024

My suggestion is that issues need to stop being marked as X-Controversial because they are in any way involved with this issue.
The controversy is solely here.

The problem is that it's unclear whether new features should expose panicking APIs or not, and that should be decided right now, because not knowing that is halting progress.
And if the API doesn't matter, because there will be a massive breaking change anyways, it's just more of an argument to stop referring this issue in other ones.

@iiYese
Copy link
Contributor

iiYese commented Sep 17, 2024

I've only seen this issue now. My take is Results are the wrong way to go about this. The original problem of "do we panic or not" comes about from error types already (Option). Adding a more granular error type (Result) with more ways to handle them doesn't remedy that it makes it worse.

The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res<Foo> doesn't exist the system just doesn't run. Similar to how iterating over a Query<Bar> that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res<Foo>> & unwrap.

This also makes for nicer breaking changes because users already have it set up so that systems with Res<Foo> don't panic cause they can't really do anything else with them? The diagnostics from those panics are unreliable at best to begin with.

Additionally this kind of default just makes for better graceful behavior for released apps.

@MiniaczQ
Copy link
Contributor

The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res<Foo> doesn't exist the system just doesn't run. Similar to how iterating over a Query<Bar> that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res<Foo>> & unwrap.

This only resolves errors during resource acquisition, systems can have other types of errors that need to be resolved and it'd be good to have a solution for that too.

@alice-i-cecile
Copy link
Member

The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res doesn't exist the system just doesn't run. Similar to how iterating over a Query that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res> & unwrap.

I do think we should do this, and if we can get this behavior in place I'd prefer to encourage the use of the Single system param from #15264. This would go a long way to addressing the problem and ensuring non-panicking behavior by default without an ergonomics hit.

That said, I agree with @MiniaczQ above that Bevy needs better error-handling from within systems in general, and generally feel like we should be using more descriptive Result types over Option in most places.

@iiYese
Copy link
Contributor

iiYese commented Sep 17, 2024

This only resolves errors during resource acquisition, systems can have other types of errors that need to be resolved and it'd be good to have a solution for that too.

Which I think should be a different mechanism. Anything that can be treated as querying the world should be treated as querying the world.

@janhohenheim
Copy link
Member Author

janhohenheim commented Sep 17, 2024

@iiYese the reason Results are being discussed over Options (apart from documentation ofc) is for error propagation without requiring anyhow-style .context, IIRC

@cart
Copy link
Member

cart commented Sep 17, 2024

The problem is that it's unclear whether new features should expose panicking APIs or not, and that should be decided right now, because not knowing that is halting progress.

Im thinking we should be building new APIs consistent with the current Bevy API pattern (panicking shortname with get_X for errors/results). Until we have solved the UX issues around error handling, I don't think we're ready to be hard-liners about Results-only apis.

Those interested in changing that paradigm should invest in resolving the technical questions outlined here, then start mapping out what a holistic port looks like so we can do this all in one fell swoop.

@MiniaczQ
Copy link
Contributor

#15302 and #15264 should help a bit, any other system params we could use?

@alice-i-cecile
Copy link
Member

NonEmptyEventReader is the other system param in this family :)

@alice-i-cecile
Copy link
Member

@maniwani has an excellent suggestion on the "allow systems to return errors" front:

I'll say that I think plumbing inputs and outputs is outside of the executor's responsibilities, so I'd like a solution more if it elides the difference between () output and Result<(), E> output before a system is presented to the executor.

From Discord

@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Oct 28, 2024
@alice-i-cecile
Copy link
Member

@hymm suggested that the scheduler should treat systems as fallible by default, and convert systems that return () into Ok(()). I like this plan, and I'm hoping to take a crack at a design in 0.16 if no one beats me to it.

@MiniaczQ
Copy link
Contributor

Note that systems can fail externally (missing parameters) or internally (return None/Err), I believe we should keep those 2 separate, so they don't get mixed when casting to Box<dyn Error> for default error handling

@NthTensor NthTensor mentioned this issue Dec 1, 2024
4 tasks
@NthTensor
Copy link
Contributor

I've had a crack at implementing the basic suggestions here: #16589.

github-merge-queue bot pushed a commit that referenced this issue Dec 5, 2024
# Objective

Error handling in bevy is hard. See for reference
#11562,
#10874 and
#12660. The goal of this PR is
to make it better, by allowing users to optionally return `Result` from
systems as outlined by Cart in
<#14275 (comment)>.

## Solution

This PR introduces a new `ScheuleSystem` type to represent systems that
can be added to schedules. Instances of this type contain either an
infallible `BoxedSystem<(), ()>` or a fallible `BoxedSystem<(),
Result>`. `ScheuleSystem` implements `System<In = (), Out = Result>` and
replaces all uses of `BoxedSystem` in schedules. The async executor now
receives a result after executing a system, which for infallible systems
is always `Ok(())`. Currently it ignores this result, but more useful
error handling could also be implemented.

Aliases for `Error` and `Result` have been added to the `bevy_ecs`
prelude, as well as const `OK` which new users may find more friendly
than `Ok(())`.

## Testing

- Currently there are not actual semantics changes that really require
new tests, but I added a basic one just to make sure we don't break
stuff in the future.
- The behavior of existing systems is totally unchanged, including
logging.
- All of the existing systems tests pass, and I have not noticed
anything strange while playing with the examples

## Showcase

The following minimal example prints "hello world" once, then completes.

```rust
use bevy::prelude::*;

fn main() {
    App::new().add_systems(Update, hello_world_system).run();
}

fn hello_world_system() -> Result {
    println!("hello world");
    Err("string")?;
    println!("goodbye world");
    OK
}
```

## Migration Guide

This change should be pretty much non-breaking, except for users who
have implemented their own custom executors. Those users should use
`ScheduleSystem` in place of `BoxedSystem<(), ()>` and import the
`System` trait where needed. They can choose to do whatever they wish
with the result.

## Current Work

+ [x] Fix tests & doc comments
+ [x] Write more tests
+ [x] Add examples
+ [X] Draft release notes

## Draft Release Notes

As of this release, systems can now return results.

First a bit of background: Bevy has hisotrically expected systems to
return the empty type `()`. While this makes sense in the context of the
ecs, it's at odds with how error handling is typically done in rust:
returning `Result::Error` to indicate failure, and using the
short-circuiting `?` operator to propagate that error up the call stack
to where it can be properly handled. Users of functional languages will
tell you this is called "monadic error handling".

Not being able to return `Results` from systems left bevy users with a
quandry. They could add custom error handling logic to every system, or
manually pipe every system into an error handler, or perhaps sidestep
the issue with some combination of fallible assignents, logging, macros,
and early returns. Often, users would just litter their systems with
unwraps and possible panics.

While any one of these approaches might be fine for a particular user,
each of them has their own drawbacks, and none makes good use of the
language. Serious issues could also arrise when two different crates
used by the same project made different choices about error handling.

Now, by returning results, systems can defer error handling to the
application itself. It looks like this:

```rust
// Previous, handling internally
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let Ok(window) = query.get_single() else {
       return;
   };
   // ... do something to the window here
}

// Previous, handling externally
app.add_systems(my_system.pipe(my_error_handler))
fn my_system(window: Query<&Window>) -> Result<(), impl Error> {
   let window = query.get_single()?;
   // ... do something to the window here
   Ok(())
}

// Previous, panicking
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let window = query.single();
   // ... do something to the window here
}

// Now 
app.add_systems(my_system)
fn my_system(window: Query<&Window>) -> Result {
    let window = query.get_single()?;
    // ... do something to the window here
    Ok(())
}
```

There are currently some limitations. Systems must either return `()` or
`Result<(), Box<dyn Error + Send + Sync + 'static>>`, with no
in-between. Results are also ignored by default, and though implementing
a custom handler is possible, it involves writing your own custom ecs
executor (which is *not* recomended).

Systems should return errors when they cannot perform their normal
behavior. In turn, errors returned to the executor while running the
schedule will (eventually) be treated as unexpected. Users and library
authors should prefer to return errors for anything that disrupts the
normal expected behavior of a system, and should only handle expected
cases internally.

We have big plans for improving error handling further:
+ Allowing users to change the error handling logic of the default
executors.
+ Adding source tracking and optional backtraces to errors.
+ Possibly adding tracing-levels (Error/Warn/Info/Debug/Trace) to
errors.
+ Generally making the default error logging more helpful and
inteligent.
+ Adding monadic system combininators for fallible systems.
+ Possibly removing all panicking variants from our api.

---------

Co-authored-by: Zachary Harrold <[email protected]>
ecoskey pushed a commit to ecoskey/bevy that referenced this issue Jan 6, 2025
# Objective

Error handling in bevy is hard. See for reference
bevyengine#11562,
bevyengine#10874 and
bevyengine#12660. The goal of this PR is
to make it better, by allowing users to optionally return `Result` from
systems as outlined by Cart in
<bevyengine#14275 (comment)>.

## Solution

This PR introduces a new `ScheuleSystem` type to represent systems that
can be added to schedules. Instances of this type contain either an
infallible `BoxedSystem<(), ()>` or a fallible `BoxedSystem<(),
Result>`. `ScheuleSystem` implements `System<In = (), Out = Result>` and
replaces all uses of `BoxedSystem` in schedules. The async executor now
receives a result after executing a system, which for infallible systems
is always `Ok(())`. Currently it ignores this result, but more useful
error handling could also be implemented.

Aliases for `Error` and `Result` have been added to the `bevy_ecs`
prelude, as well as const `OK` which new users may find more friendly
than `Ok(())`.

## Testing

- Currently there are not actual semantics changes that really require
new tests, but I added a basic one just to make sure we don't break
stuff in the future.
- The behavior of existing systems is totally unchanged, including
logging.
- All of the existing systems tests pass, and I have not noticed
anything strange while playing with the examples

## Showcase

The following minimal example prints "hello world" once, then completes.

```rust
use bevy::prelude::*;

fn main() {
    App::new().add_systems(Update, hello_world_system).run();
}

fn hello_world_system() -> Result {
    println!("hello world");
    Err("string")?;
    println!("goodbye world");
    OK
}
```

## Migration Guide

This change should be pretty much non-breaking, except for users who
have implemented their own custom executors. Those users should use
`ScheduleSystem` in place of `BoxedSystem<(), ()>` and import the
`System` trait where needed. They can choose to do whatever they wish
with the result.

## Current Work

+ [x] Fix tests & doc comments
+ [x] Write more tests
+ [x] Add examples
+ [X] Draft release notes

## Draft Release Notes

As of this release, systems can now return results.

First a bit of background: Bevy has hisotrically expected systems to
return the empty type `()`. While this makes sense in the context of the
ecs, it's at odds with how error handling is typically done in rust:
returning `Result::Error` to indicate failure, and using the
short-circuiting `?` operator to propagate that error up the call stack
to where it can be properly handled. Users of functional languages will
tell you this is called "monadic error handling".

Not being able to return `Results` from systems left bevy users with a
quandry. They could add custom error handling logic to every system, or
manually pipe every system into an error handler, or perhaps sidestep
the issue with some combination of fallible assignents, logging, macros,
and early returns. Often, users would just litter their systems with
unwraps and possible panics.

While any one of these approaches might be fine for a particular user,
each of them has their own drawbacks, and none makes good use of the
language. Serious issues could also arrise when two different crates
used by the same project made different choices about error handling.

Now, by returning results, systems can defer error handling to the
application itself. It looks like this:

```rust
// Previous, handling internally
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let Ok(window) = query.get_single() else {
       return;
   };
   // ... do something to the window here
}

// Previous, handling externally
app.add_systems(my_system.pipe(my_error_handler))
fn my_system(window: Query<&Window>) -> Result<(), impl Error> {
   let window = query.get_single()?;
   // ... do something to the window here
   Ok(())
}

// Previous, panicking
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let window = query.single();
   // ... do something to the window here
}

// Now 
app.add_systems(my_system)
fn my_system(window: Query<&Window>) -> Result {
    let window = query.get_single()?;
    // ... do something to the window here
    Ok(())
}
```

There are currently some limitations. Systems must either return `()` or
`Result<(), Box<dyn Error + Send + Sync + 'static>>`, with no
in-between. Results are also ignored by default, and though implementing
a custom handler is possible, it involves writing your own custom ecs
executor (which is *not* recomended).

Systems should return errors when they cannot perform their normal
behavior. In turn, errors returned to the executor while running the
schedule will (eventually) be treated as unexpected. Users and library
authors should prefer to return errors for anything that disrupts the
normal expected behavior of a system, and should only handle expected
cases internally.

We have big plans for improving error handling further:
+ Allowing users to change the error handling logic of the default
executors.
+ Adding source tracking and optional backtraces to errors.
+ Possibly adding tracing-levels (Error/Warn/Info/Debug/Trace) to
errors.
+ Generally making the default error logging more helpful and
inteligent.
+ Adding monadic system combininators for fallible systems.
+ Possibly removing all panicking variants from our api.

---------

Co-authored-by: Zachary Harrold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible D-Complex Quite challenging from either a design or technical perspective. Ask for help! M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide S-Needs-SME Decision or review from an SME is required X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

No branches or pull requests

9 participants