Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unwrap considered harmful #12660

Open
james7132 opened this issue Mar 22, 2024 · 30 comments
Open

unwrap considered harmful #12660

james7132 opened this issue Mar 22, 2024 · 30 comments
Labels
A-Cross-Cutting Impacts the entire engine C-Code-Quality A section of code that is hard to understand or change P-Crash A sudden unexpected crash

Comments

@james7132
Copy link
Member

james7132 commented Mar 22, 2024

What problem does this solve or what need does it fill?

Unwrap should generally be avoided. We already encourage contributors to avoid the use of Option::unwrap, yet crashes from panics are still one of the most common categories of issues we're seeing now.

What solution would you like?

  • Enable the clippy unwrap_used lint at a workspace level, and add allow-unwrap-in-tests = true to Clippy.toml.
  • Replace actual expected crashes with expect or unwrap_or_else with a panic where possible, otherwise properly gracefully handle the error.
  • Preferably avoid panicking in the renderer entirely, instead failing gracefully where possible by logging a warning and not rendering anything or using blatantly obvious errors (i.e. Valve's missing texture shader).

What alternative(s) have you considered?

Leaving it as is. Keep crashing from unwraps.

@james7132 james7132 added P-Crash A sudden unexpected crash C-Code-Quality A section of code that is hard to understand or change A-Meta About the project itself X-Controversial There is active debate or serious implications around merging this PR labels Mar 22, 2024
@alice-i-cecile
Copy link
Member

See #10166 for a major culprit.

@SludgePhD
Copy link
Contributor

Another API design issue related to this is Query::single / Query::get_single – here, the shorter method name that users are most likely to go for will .unwrap() implicitly. IMO Query::single is almost never the right choice and just making it have the semantics of get_single would be better.

@alice-i-cecile
Copy link
Member

single/get_single used to work like that: users at the time found it tedious to constantly check if their player or camera had despawned!

It was also moved to be consistent with .resource / .get_resource, which while it still panics is much less prone to unexpected failure, as removing resources is much more rare than despawning entities.

That said, I am personally allergic to using .single in my own serious projects due to frustrations with panics in edge cases and during refactors, and return early on errors.

@bushrat011899
Copy link
Contributor

Perhaps instead of get_x and x, we could use x and x_unchecked instead? This makes the panicking API clearly distinguishable from the non-panicing, while also encouraging the idiomatic path.

Regardless, I agree with this change. unwrap is a code smell within a library.

@james7132
Copy link
Member Author

x_unchecked makes my mind jump immediately to unsafe, which we shouldn't be encouraging unless it's paramount that users squeeze every last ounce of performance out.

@alice-i-cecile
Copy link
Member

I think that x_unchecked (by whatever name) isn't worth adding. It's more API to maintain, and not actually much nicer than just another unwrap call.

@benfrankel
Copy link
Contributor

Another source of panics is when systems with Res or ResMut arguments are scheduled when the resource doesn't exist. In complex applications this might be triggered only in very specific scenarios, making it difficult to catch by playtesting.

The alternatives off the top of my head are a little awkward, though:

  1. If a system requires a resource that does not exist, skip that system and log a warning.
  2. Require systems to use Option<Res> instead of Res, or give Res the same behavior as Option<Res> somehow.

@msvbg
Copy link
Contributor

msvbg commented Mar 24, 2024

I would be in favor of renaming single to something like get_single_unchecked, or some other similarly clunky name, to discourage its use. Or it could be removed entirely. I've seen a fair few crashes coming from single.

@alice-i-cecile alice-i-cecile added A-Cross-Cutting Impacts the entire engine and removed A-Meta About the project itself labels Mar 24, 2024
@alice-i-cecile
Copy link
Member

#1255 is a big source of dumb panics as I refactor as well.

@james7132
Copy link
Member Author

Removing the controversial label, as it seems like everyone's in agreement that this is something we should be doing.

@james7132 james7132 removed the X-Controversial There is active debate or serious implications around merging this PR label Mar 24, 2024
@yyogo
Copy link
Contributor

yyogo commented Mar 28, 2024

How would you like PRs submitted for this? I have a branch where I started removing unwrap()s in most of the smaller crates but it turned out to be quite huge. Maybe this issue can track which crates have been "cleaned"?

@alice-i-cecile
Copy link
Member

@yyogo one crate at a time please: although for bevy_ecs it's likely best to do it one module at a time.

@Brezak
Copy link
Contributor

Brezak commented Mar 28, 2024

If we're doing this one crate at a time a tracking issue to keep track which crates have been cleaned up would be useful.

@vertesians
Copy link
Contributor

I would be in favor of renaming single to something like get_single_unchecked, or some other similarly clunky name, to discourage its use. Or it could be removed entirely. I've seen a fair few crashes coming from single.

In that situation, wouldn't it make more sense to deprecate single() entirely and tell people to use get_single().unwrap() instead?

@musjj
Copy link
Contributor

musjj commented Apr 2, 2024

Maybe relevant:

I also find myself reaching for .single() way too often, but only because the standard Rust error-handling patterns are not really usable with Bevy. I feel that if we can just do something like .single()?, a lot of these issues would just disappear.

@mockersf
Copy link
Member

when replacing an unwrap with an expect, the message should:

  • be helpful to understand why the panic happens
  • be written to target people who will see them: is it for engine developers trying to fix an internal bug, for engine users trying to fix their inputs, for game player trying to report a bug
  • provide more context than what reading the code would give

if the message starts being too long, an error code can be used to link to a page with more details, see https://github.com/bevyengine/bevy/tree/main/errors

@cart
Copy link
Member

cart commented May 15, 2024

I'm also on board for making Bevy less panicky. But I don't think we should seriously consider removing panicking APIs or renaming get_x to x until we have made error handling significantly nicer and more opinionated.

First, without Results + ? syntax, working with options/results is comparatively nasty ergonomically:

let Ok(window) = windows.get_single() else {
    return;
};
let Some(cursor) = window.cursor_position() else {
    return;
};
let Ok(board) = boards.get_single() else {
    return;
};
let Ok((camera, camera_transform)) = cameras.get_single() else {
    return;
};
let Some(ray) = camera.viewport_to_world_2d(camera_transform, cursor) else {
    return;
};
// Panicking variant
let window = windows.single();
let cursor = window.cursor_position().unwrap();
let board = boards.single();
let (camera, camera_transform)) = cameras.single();
let ray = camera.viewport_to_world_2d(camera_transform, cursor).unwrap();

Note that I've just copy/pasted some code from one of my games. Some of the unwraps don't make sense in this context.

Porting this to a Result system would fail, naively:

fn system(.....) -> Result<()> {
  let window = windows.get_single()?;
  // this returns Option and therefore fails
  let cursor = window.cursor_position()?;
  let board = boards.get_single()?;
  let (camera, camera_transform) = cameras.get_single()?;
  // this returns Option and therefore fails
  let ray = camera.viewport_to_world_2d(camera_transform, cursor)?;
  Ok(())
}

This is a Rust restriction (unfortunately). It is impossible to build a Result / error type that handles both Results and Options.

Bevy APIs (including core ECS apis) sometimes return Result and sometimes return Option (ex: Queries return Results, World apis and Assets return Options). In these cases you either need to ok_or (to convert all Options to Results) or result.ok() (to convert all Results to Options):

fn system(.....) -> Option<()> {
  let window = windows.get_single().ok()?;
  let cursor = window.cursor_position()?;
  let board = boards.get_single().ok()?;
  let (camera, camera_transform) = cameras.get_single().ok()?;
  let ray = camera.viewport_to_world_2d(camera_transform, cursor)?;
}
fn system(.....) -> Result<()> {
  let window = windows.get_single()?;
  let cursor = window.cursor_position().ok_or(MyErr)?;
  let board = boards.get_single()?;
  let (camera, camera_transform) = cameras.get_single().?;
  let ray = camera.viewport_to_world_2d(camera_transform, cursor).ok_or(MyErr)?;
  Ok(())
}

The system above actually isn't too bad. But imagine doing this when interleaving Queries / Assets / EntityMut, etc in the same system. If we're going to embrace error handling, I think we need to convert most internal APIs to use Results instead of Option to avoid the compatibility issues. Using them together in their current form is an exercise in frustration and ergonomically unsatisfying.

@cart
Copy link
Member

cart commented May 15, 2024

Note that Result systems are already supported. However you need to manually convert them to "normal systems" using map:

app.add_systems(Update, a_system.map(error));

fn a_system(query: Query<&A>) -> Result<()> {
  let a = query.get_single()?;
  Ok(())
}

We'd want to make this automatic if we're going to encourage this pattern:

app.add_systems(Update, a_system);

@cart
Copy link
Member

cart commented May 15, 2024

We'd also want our own Result error type similar to anyhow (in that it can support arbitrary error types via boxing).

@cart
Copy link
Member

cart commented May 15, 2024

I'm also partial to this in my own code as I don't love typing/reading Ok(()) at the end of every system:

const OK: Result<()> = Ok(());

fn system() -> Result<()> {
  OK
}

@bushrat011899
Copy link
Contributor

I'm also partial to this in my own code as I don't love typing/reading Ok(()) at the end of every system:

const OK: Result<()> = Ok(());

fn system() -> Result<()> {
  OK
}

Perhaps a #[system] macro similar to Leptos' #[component] would be worth investing in here? We already have a gap in error messaging due to how all_tuples! styled trait implementations work, so there's already an incentive here.

@bushrat011899
Copy link
Contributor

Also worth considering is the same game might want differing behaviour for different Result systems. If the system that plays a sound effect fails, maybe I only want the error logged, if one of my networking systems fails, perhaps I want to take the user to the main menu, etc.

@NiseVoid
Copy link
Contributor

But I don't think we should seriously consider removing panicking APIs or renaming get_x to x until we have made error handling significantly nicer and more opinionated.

I think removing them would definitely require easy "error handling" for systems, but I think renaming them could have a relatively minor impact as long as it's not get_x -> x and x -> some significantly longer name like x_unchecked that also suggests it's unsafe. The warnings you'd get on a migration like that would also suggest the panicking version was removed. I remember in Go many libraries had equivalent of a get_x and must_x, with the latter panicking if anything went wrong.

I think overall it's quite problematic to have panicking versions be the obvious function because they are at x, and nothing in their name reminds you that they will in fact panic. When I started using bevy the panicking versions seemed nice, but the longer I've worked on my game the more annoying these version are. I usually end up replacing all the panicking versions during a refactor of my game's states because the crash, recompile, repeat loop slows down development to a crawl. Making the new user experience simpler is great, but not if it's in ways that guarantee the user will need to do minor refactors all over their codebase later.

@musjj
Copy link
Contributor

musjj commented May 15, 2024

Note that Result systems are already supported. However you need to manually convert them to "normal systems" using map:

Yes, but this will completely obscure where the error occurred, making it borderline unusuable IMO. bevy_mod_sysfail is slightly better because it includes the name of the system in the log message. But all the backtrace (line number, etc.) is still gone, which is why I still prefer using .unwrap.

Apparently anyhow can capture backtraces, which seems suitable for user code. Maybe bevy can integrate with it?

@tbillington
Copy link
Contributor

tbillington commented Jun 12, 2024

While it's more in "user code/systems", I wanted to contribute what I've been doing for ages now. Don't know how relevant to "engine" code it is, but since adopting this pattern I practically never reach for Result or Option much anymore. I have a util module with a selection of macros that will do the named operation or return if not possible.

The practical result of how these macros simplify/speed up writing gameplay logic is that I never use unwrap in my code.

The advantages are

  • Less/no incentive to unwrap
  • Less boilerplate let else spam
  • After using them a little bit it's quite obvious what they do when you see them, their definitions are dead simple
  • Plays nicely with for_each query iteration
  • Each path to replace with let else with debug error! and return later
  • I basically don't have any panicing code in my gameplay systems
  • Don't need to fiddle with return types of systems etc into -> Result or -> Option, the same macros work in practically every context

Downsides

  • If the preconditions aren't met and I haven't productionised the code to use let else with error!;return yet sometimes "nothing will happen" if I've broken logic somewhere

Example macros

macro_rules! get_mut {
    ($q:expr,$r:expr) => {
        match $q.get_mut($r) {
            Ok(m) => m,
            _ => return,
        }
    };
}

macro_rules! get_single {
    ($q:expr) => {
        match $q.get_single() {
            Ok(m) => m,
            _ => return,
        }
    };
}

// get!, get_single_mut! some!, ok! ...etc

For less critical/WIP code I'm usually fine with "nothing happening" if the expected values aren't available. I do not want to panic or unwrap for these, I would have just written a let else to return anyway.

When I am happy with the behaviour and want to commit to that code I replace the macros with a let else that error!s with debug information then returns if the value couldn't be gotten.

fn grid_debug_gizmo(
    mut gizmos: Gizmos,
    window: Query<&Window>,
    camera: Query<(&Camera, &GlobalTransform), With<crate::MainCamera>>,
) {
    let window = get_single!(window);
    let (camera, camera_t) = get_single!(camera);
    let cursor_ui_pos = some!(window.cursor_position());
    let cursor_world_pos = some!(camera.viewport_to_world_2d(camera_t, cursor_ui_pos));

    gizmos.rect_2d(
        cursor_world_pos.grid_aligned_centered(),
        0.0,
        Vec2::splat(GRID_SIZE + 4.0),
        Color::PINK,
    );
}

The nice feature of the macros returning is it plays well with iterating queries using for_each. If you were using for loops it would short circuit out of the entire system which is usually undesired.

fn handle_game_scene_switch(location: Query<&PawnLocation>, etc) {
    query.iter().for_each(|(e, p)| {
        let location = get!(location, p.get());
        setup_local_map_pawn_renderer(&mut cmd, e, &asset_server, location, etc);
    });
}

@tychedelia
Copy link
Contributor

I find these despawn panics very annoying, but just as a counterpoint, they have helped me identify bad system ordering before, where I was violating my own invariants. Part of me feels like these should be warn rather than debug.

@ramirezmike
Copy link
Contributor

I just ran into this which panics because the rotation is denormalized.

Transform::from_rotation(Quat::from_array([0.0, -0.0, 0.0, 0.0])).forward();

I'm still digging into how my transform's rotation ends up this way, but it was really surprising that Transform::forward can panic (because it calls local_z which can panic because of Quat multiplication #12981).

Should something as ubiquitous as Transform note in its docs that it can panic due to Quat's panics?

Also, separately, am I crazy? I'm running into Quat panics but I'm not explicitly enabling glam_assert.. it even happens in bevy playground.

https://learnbevy.com/playground?share=5763bdd5c6ccc345e4fa6d0e054e2cd484c211a7603c589fc01fb3a03910c7b5

@alice-i-cecile
Copy link
Member

the thing I don't get is why to_scale_rotation_translation is spitting out a quat which is so denormalized

By @mweatherley on Discord. Strongly agree that we shouldn't be panicking there though, simply loudly erroring.

@mweatherley
Copy link
Contributor

Yeah, I agree with Alice that these situations should loudly warn instead of panicking outright. In some of these cases (e.g. ramirezmike's example) it's important for the issue to be fixed upstream because the Transform is actually invalid (i.e. unrecoverable internal state) instead of just being somewhat denormalized.

@cart
Copy link
Member

cart commented Jul 11, 2024

Just outlined what I think our plan should be here: #14275 (comment)

github-merge-queue bot pushed a commit that referenced this issue Jul 22, 2024
# Objective

- It's possible to have errors in a draw command, but these errors are
ignored

## Solution

- Return a result with the error

## Changelog

Renamed `RenderCommandResult::Failure` to `RenderCommandResult::Skip`
Added a `reason` string parameter to `RenderCommandResult::Failure`

## Migration Guide
If you were using `RenderCommandResult::Failure` to just ignore an error
and retry later, use `RenderCommandResult::Skip` instead.

This wasn't intentional, but this PR should also help with
#12660 since we can turn a few
unwraps into error messages now.

---------

Co-authored-by: Charlotte McElwain <[email protected]>
@NthTensor NthTensor mentioned this issue Dec 1, 2024
4 tasks
github-merge-queue bot pushed a commit that referenced this issue Dec 5, 2024
# Objective

Error handling in bevy is hard. See for reference
#11562,
#10874 and
#12660. The goal of this PR is
to make it better, by allowing users to optionally return `Result` from
systems as outlined by Cart in
<#14275 (comment)>.

## Solution

This PR introduces a new `ScheuleSystem` type to represent systems that
can be added to schedules. Instances of this type contain either an
infallible `BoxedSystem<(), ()>` or a fallible `BoxedSystem<(),
Result>`. `ScheuleSystem` implements `System<In = (), Out = Result>` and
replaces all uses of `BoxedSystem` in schedules. The async executor now
receives a result after executing a system, which for infallible systems
is always `Ok(())`. Currently it ignores this result, but more useful
error handling could also be implemented.

Aliases for `Error` and `Result` have been added to the `bevy_ecs`
prelude, as well as const `OK` which new users may find more friendly
than `Ok(())`.

## Testing

- Currently there are not actual semantics changes that really require
new tests, but I added a basic one just to make sure we don't break
stuff in the future.
- The behavior of existing systems is totally unchanged, including
logging.
- All of the existing systems tests pass, and I have not noticed
anything strange while playing with the examples

## Showcase

The following minimal example prints "hello world" once, then completes.

```rust
use bevy::prelude::*;

fn main() {
    App::new().add_systems(Update, hello_world_system).run();
}

fn hello_world_system() -> Result {
    println!("hello world");
    Err("string")?;
    println!("goodbye world");
    OK
}
```

## Migration Guide

This change should be pretty much non-breaking, except for users who
have implemented their own custom executors. Those users should use
`ScheduleSystem` in place of `BoxedSystem<(), ()>` and import the
`System` trait where needed. They can choose to do whatever they wish
with the result.

## Current Work

+ [x] Fix tests & doc comments
+ [x] Write more tests
+ [x] Add examples
+ [X] Draft release notes

## Draft Release Notes

As of this release, systems can now return results.

First a bit of background: Bevy has hisotrically expected systems to
return the empty type `()`. While this makes sense in the context of the
ecs, it's at odds with how error handling is typically done in rust:
returning `Result::Error` to indicate failure, and using the
short-circuiting `?` operator to propagate that error up the call stack
to where it can be properly handled. Users of functional languages will
tell you this is called "monadic error handling".

Not being able to return `Results` from systems left bevy users with a
quandry. They could add custom error handling logic to every system, or
manually pipe every system into an error handler, or perhaps sidestep
the issue with some combination of fallible assignents, logging, macros,
and early returns. Often, users would just litter their systems with
unwraps and possible panics.

While any one of these approaches might be fine for a particular user,
each of them has their own drawbacks, and none makes good use of the
language. Serious issues could also arrise when two different crates
used by the same project made different choices about error handling.

Now, by returning results, systems can defer error handling to the
application itself. It looks like this:

```rust
// Previous, handling internally
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let Ok(window) = query.get_single() else {
       return;
   };
   // ... do something to the window here
}

// Previous, handling externally
app.add_systems(my_system.pipe(my_error_handler))
fn my_system(window: Query<&Window>) -> Result<(), impl Error> {
   let window = query.get_single()?;
   // ... do something to the window here
   Ok(())
}

// Previous, panicking
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let window = query.single();
   // ... do something to the window here
}

// Now 
app.add_systems(my_system)
fn my_system(window: Query<&Window>) -> Result {
    let window = query.get_single()?;
    // ... do something to the window here
    Ok(())
}
```

There are currently some limitations. Systems must either return `()` or
`Result<(), Box<dyn Error + Send + Sync + 'static>>`, with no
in-between. Results are also ignored by default, and though implementing
a custom handler is possible, it involves writing your own custom ecs
executor (which is *not* recomended).

Systems should return errors when they cannot perform their normal
behavior. In turn, errors returned to the executor while running the
schedule will (eventually) be treated as unexpected. Users and library
authors should prefer to return errors for anything that disrupts the
normal expected behavior of a system, and should only handle expected
cases internally.

We have big plans for improving error handling further:
+ Allowing users to change the error handling logic of the default
executors.
+ Adding source tracking and optional backtraces to errors.
+ Possibly adding tracing-levels (Error/Warn/Info/Debug/Trace) to
errors.
+ Generally making the default error logging more helpful and
inteligent.
+ Adding monadic system combininators for fallible systems.
+ Possibly removing all panicking variants from our api.

---------

Co-authored-by: Zachary Harrold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Cross-Cutting Impacts the entire engine C-Code-Quality A section of code that is hard to understand or change P-Crash A sudden unexpected crash
Projects
None yet
Development

No branches or pull requests