Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow unwinding from Drop impls #3288

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Jul 5, 2022

This RFC proposes to change this behavior to always abort if a panic attempts to escape a drop, even if this drop was invoked as part of normal function execution (as opposed to unwinding which already aborts with a double-panic).

Previous discussion: rust-lang/lang-team#97

Rendered

FCP comment

@scottmcm scottmcm added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jul 5, 2022
text/0000-panic-in-drop.md Outdated Show resolved Hide resolved
text/0000-panic-in-drop.md Outdated Show resolved Hide resolved
text/0000-panic-in-drop.md Outdated Show resolved Hide resolved
text/0000-panic-in-drop.md Show resolved Hide resolved
@Mark-Simulacrum
Copy link
Member

I think it would also be good to add an explicit section to the RFC about how users will find out about the change in behavior, just to document that going in - for example, release blog post and compat note would be a good start in my eyes, but we could also consider some softer change (e.g., adjusting the panic logging to say "this will abort starting in X release) to phase this in. AIUI, there's not really much static checking we can do, so that's probably about our limit...

@Gankra
Copy link
Contributor

Gankra commented Jul 5, 2022

I'm a huge fan of this proposal. When writing the rustonomicon this was one of the few major details of rust I couldn't really get a straight answer from anyone about. From my perspective it existed in the same grey area as unwinding through FFI, where we just lacked institutional expertise on the subtleties of unwinding corner cases and had punted on it.

This is why I never really adequately documented it. The current behaviour is very subtle and not something even stdlib devs ever had institutional knowledge or consideration of.

Writing exception-safe unsafe code is already enough of a headache, partially because it comes up rarely enough that it's hard to stay actively vigilant for. I think like, BinaryHeap is one of the only places in liballoc that has to do some explicit exception-safety fuckery because it needs to invoke the user-defined comparator repeatedly while doing mutations that put it in a transiently invalid state. All those stars lining up is actually surprisingly rare! Most of the time there are distinct lookup, allocate, then modify phases; so code is trivially exception-safe (assuming your own code doesn't panic, which collections generally only do during the lookup and allocations, not the final modification). Having less places where this rare problem can surprisingly come up, especially from user code, seems like a great thing.

We've never had a good story about this situation and at this point I think we have ample evidence it should just be banished to the shadow realm for being more trouble than it's worth. Especially since the stdlib already has the established precedent of Buf and File both just swallowing errors on Drop. Those cases were going to be my one hand-wring for this change, so it turns out I have nothing to complain about! (My memory was that they panicked, but honestly I never really gave it much consideration).

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Jul 5, 2022

I'd like to rehash my earlier arguments against this change, for those starting from this issue. I'd like to begin with some statements that I think everyone here agrees with:

  • Disregarding code size, this change only assists unsafe code. All safe code remains safe in the presence of unwinding panics.
  • Authors of some applications, such as certain web servers, would strongly prefer that threads unwind on critical failure rather than abort the entire process.
  • Some real-world Rust code can panic on drop today. (For instance, a Drop impl that calls println!() can panic if it fails to write.)
  • Today, a normally-unwinding panic from a Drop impl aborts the process if and only if either -C panic=abort is set, or the thread is already unwinding. With this change, a panic from a Drop impl will abort unconditionally.
  • As a consequence, some code that used to unwind the thread will now abort the process, to the (perhaps minor) detriment of certain web servers and some other applications.

Thus, I believe that since this is a silent change that will harm some current use cases, we should really show that the benefit of this change is greater than the cost, and that there are neither more beneficial nor less costly alternatives. I argue that such an alternative exists: keep panic-in-drop=unwind as the default, and provide better tools for writing drop-sensitive unsafe code.

Currently, the RFC twice compares this change to C++11's noexcept destructors, and only briefly mentions the differences in the "Prior Art" section. However, these differences are very important. In C++, each function parameter and local variable runs its destructor at the end of its scope. There is no way to prevent this, short of performing certain hacks with a union. There is also no way to run the destructor explicitly; even a move operation creates a new value and leaves a sentinel which must be destructed. But in Rust, destruction is determined by ownership semantics. If we pass a variable into a function parameter by value, the function (or data structure it is attached to) will become responsible for dropping it. This allows us to explicitly drop any value at any point in time, by passing it to the aptly-named drop() function.

The ability of Rust code to explicitly control dropping is quite relevant here. In C++, destruction is always invisible and uncontrollable (except if a value is placed into an std::optional which is reset), and so it makes a lot of sense not to mandate exception-safety surrounding it. In Rust, we can make drops clearly visible in unsafe code by using drop() and similar constructions, so the rationale doesn't apply nearly as absolutely.

The problem is, unsafe code today can easily implicitly drop values without realizing there's an issue; just look at the catch_unwind footgun (rust-lang/rust#86027). I believe that the best and most direct way to mitigate this is to provide better tools, both in the compiler and in the standard library, to catch these footguns instead of leaving them for the unsafe code's author to figure out. Importantly, we should clearly recommend using these tools in our docs and our books. To give an example, one crude but effective tool, from a suggestion by @danielhenrymantilla, would be an allow-by-default lint that triggers on every variable that implicitly calls Drop::drop(), and suggests explicitly drop()ping the variable. We could advise library authors to activate this lint on any panic-sensitive code that handles trait objects or user data types.


There are a few different variations on how the panic-in-drop option is to be handled. To summarize my position on each:

  • -Z panic-in-drop=unwind by default, -Z panic-in-drop=abort allowed: The current status quo. Acceptable, given additional tools to help unsafe code authors.
  • -C panic-in-drop=unwind by default, -C panic-in-drop=abort allowed: My preferred option, given additional tools. Users could manually choose panic-in-drop=abort for the improved code size and compile times, without needing the fully-fledged panic=abort.
  • -Z panic-in-drop=abort by default, -Z panic-in-drop=unwind allowed: This proposal, which I am against. It provides the code-size and compile-time improvements, but introduces a silent change that restricts some use cases with no stable workaround.
  • -C panic-in-drop=abort by default, -C panic-in-drop=unwind allowed: I am strongly against this. It would make unwinding drops possible, but it would fracture the package ecosystem: new unsafe container libraries could easily ignore support for the now-special panic-on-drop=unwind configuration, making the option very dangerous to use.

@CAD97
Copy link

CAD97 commented Jul 5, 2022

I would like to echo that the RFC as currently written puts too much weight on the code size / compilation benefits of the change. These are a strong argument for making the option available, but distract from the correctness argument for making it mandatory. Presenting this in the main text which is concerned with justifying changing the semantics of existing code serves to distinctly weaken the justification, since the argued position is not necessary for a primary argued benefit.

If the change is to be made, it should be justified on the unwind safety benefits alone, and the other benefits are just a very convenient bonus.

There is additionally a third undiscussed option: panic-in-drop=catch. Effectively, this automatically converts types to the recommended panic-in-drop=abort behavior of attempting as much cleanup as possible but swallowing the error and continuing on if something goes wrong. Of course, this is not without its own annoying problems (e.g. a field relying on the parent destructor to put it back into a state that is safe to drop... though imho that should definitely be using ManuallyDrop already anyway), but it does still have the distinct advantage of providing the nounwind benefit while avoiding introducing new implicit aborts into existing code.

I also need to point out what I think should be addressed, or at least acknowledged as a drawback: it is impossible to drop a structure where dropping its fields may panic without risking a panic-in-drop=abort. Even if you use ManuallyDrop to drop fields within your drop impl rather than after, the field's drop glue will cause an abort, so if you have legacy code that e.g. calls eprintln! in its destructor1, you have no recourse other than to forget to drop that field and leak its resources.

A final really silly option: add ptr::drop_in_place_maybe_unwind. This would explicitly generate and call panic-in-drop=unwind style drop glue, allowing you to lift panickingaborting destructors into a panicking close automatically.


If the compiler had some sort of #[no_panic] support that could be used for drop impls, I would be less worried about a change along these lines, because it would be at least somewhat possible to verify that drops don't cause an abort in normal execution.

Footnotes

  1. ... I have some code to go update to be more resilient against double panics.

text/0000-panic-in-drop.md Outdated Show resolved Hide resolved
@afetisov
Copy link

afetisov commented Jul 5, 2022

I'll just note that the performance and code size changes are pretty inconsequential, for a break in backwards compatibility. 3-5% decrease in code size is irrelevant for all but the most resource-constrained applications, which will likely use panic=abort anyway.

Also, while the change would possibly simplify the correctness of unsafe code, it would compromise the correctness of the application at large, which may rely on properly handling Drop panics. This is a big issue, both because the application is much harder to audit due to its size and lack of compiler-enforced panic boundaries, and because the application is usually written by less experienced users, including total newbies, while unsafe code is relegated to the experts.

The change in behaviour of Drop would be a breaking change, and a very insidious one, since it is entirely silent and hard to audit. That's the worst kind of breakage that can be introduced. It is also well-known that Rust currently allow panic in Drop, so many crates may rely on it. scopeguard is one mentioned example, Another example is the drop_bomb crate, which is used in 53 dependencies, including rust-analyzer. I'm sure there are Drop impls in the wild which use the same pattern without explicitly using those dependencies.

@Lokathor
Copy link
Contributor

Lokathor commented Jul 5, 2022

An attribute that specific unsafe types can put on their own drop impl if they need to be absolutely sure that no unwind can escape their drop is one thing. This is too much.

@kornelski
Copy link
Contributor

kornelski commented Jul 5, 2022

I feel this is a throwing baby with the bathwater situation. I'd prefer exploration of alternative designs. For example:

  • Add ability to forbid all panics in a function or a block of code. For unsafe code panics are dangerous, but panic in Drop is only a one of many cases. There could be unexpected panics lurking in Deref and overloaded operators. unsafe needs to be aware of all of them, so this solution is insufficient.

  • catch_unwind is a terrible gotcha that definitely needs addressing, but it also is a very unique case. There is no other panic-catching function, so there is no other place in Rust where code that clearly doesn't want a panic may still get one. I feel that a solution narrowly targeted specifically at catch_unwind would be sufficient to solve this gotcha, e.g. forbid panic in drop only in types used with catch_unwind (and dyn if necessary) or mark only callers of catch_unwind with panic=abort. There could be a safer catch_unwind variant that requires return type to be Copy.

  • For applications concerned about code size overhead there is panic=abort already. If a more granular solution is needed, this could be offered with an explicit function attribute.

Copy link

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about it some more, and I am currently against changing the global default to panic-in-drop=abort. I have a hypothesis that end of scope drops are not the problem to writing unwind-safe code. In most cases, unsafe code is written to have an unsafe precondition but a sound (or at least no-more-unsound) postcondition in a { suspend_safety; defer! { restore_safety; } tricky_code; return; } pattern.

Because of this, most code, even unsafe code, is unwind safe. Of course, the issue making unwind safety difficult is specifically that most code doesn't need to care, so when you do need to care, it's fairly easy to forget to. (This is an inherent limitation of operational pre/post-condition soundness semantics; an axiomatic invariant is just never put into an invalid unsound state in the first place.)

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding. I would much rather see (and am drafting) a proposal providing APIs for unsafe code to control unwinds more carefully; namely some combination of:

  • drop_unwind which returns Option<R> rather than Result<R, UnwindPayload> to prevent Footgun with catch_unwind when catching panic-on-drop types rust#86027 issues.
  • catch_unwind2 that forces explicitly deciding how to deal with payload drop panics.
    • Two main ways to do so: a UnwindPayload type which is ManuallyDrop, or a #[must_use] Result-variant that doesn't offer functionality implicitly dropping the unwind payload.
    • Even better if this can replace catch_unwind in-place. (Doing so in existing editions would require #[deprecated] coercions for use of the current API or a strong argument for being a soundness fix that breaks inference.)
    • Author's current weak preference: suspend_unwind, which returns Result<R, Unwind>, and dropping Unwind calls resume_unwind.
  • #[abort_unwind] to add unwind-to-abort shims to arbitrary functions.
  • #[catch_unwind(return)] to add catch_unwind shims returning some fallback value.
  • Some #[drop_abort_unwind]/#[drop_catch_unwind] to do the above but on the full drop glue including dropping fields (as just shimming Drop::drop will not catch fields' drop glue unwinding).
  • Per-type control between leaking fields on drop glue unwind and suspending the unwind to drop fields.
  • drop_in_place_abort_unwinds which calls drop glue with an unwind-to-abort shim.
  • drop_in_place_catch_unwinds which calls drop glue returning some Result<(), PanicPayload> (see catch_unwind2 for options here).
  • drop_in_place_drop_unwinds which calls drop glue with drop_unwind glue.
  • #[warn(implicit_drop_glue)] to warn where drop glue is inserted. (A good restriction lint candidate, Though perhaps perf sensitive?)
    • Potential refinements involve only warning on drop glue which is not nounwind.

The goal of the above is providing options to unsafe code to allow code to choose their guarantees.


The below is a very interesting alternative to both panic-in-drop=unwind and panic-in-drop=abort.

There's additionally another intriguing middle ground which should be listed as an alternative: drop glue / drop_in_place can still unwind, and a field's drop glue unwinding continues to unwind and drop other fields (siblings and of parent containing types), but unwind is implicitly caught and turned into an abort only when the unwind exits from the drop glue stack into a calling function.

(How this behavior translates to async witness tables is a question in this scheme that still would need to be answered. My personal preferred solution is to treat dropping an async witness table identically to as if it were an unwind (as the sync analogue is exactly an unwind), thus setting thread::panicking() to true to eagerly convert any panics into aborts.)

This scheme has the potential to combine the benefits of both panic-in-drop=abort1 and panic-in-drop=unwind, though with a bit of additional effort. Namely, implicit drop glue unwind points are removed as a thing that unsafe code has to worry about, but drop_in_place still allows drop implementations to unwind at clearly marked points. Also notable is that drop under this scheme can be made to opt-in by-value drops unwinding by reimplementing it as ManuallyDrop::drop(&mut ManuallyDrop::new(it)).

Combined with some of the above controls for managing unwinding, this is a very promising approach to very potentially gain the benefit of this RFC while mitigating the downsides. (For maximum control, make a function-level switch between drop glue unwinds propagating or aborting.)

Footnotes

  1. The unwind-safety benefits, anyway; the code size benefits will be reduced, if not completely eliminated, by drop glue handling suspend-and-continue unwind semantics.

text/0000-panic-in-drop.md Outdated Show resolved Hide resolved

This RFC is already implemented as an unstable compiler flag which controls the behavior of panics escaping from a `Drop` impl.

Performance results from [rust-lang/rust#88759](https://github.com/rust-lang/rust/pull/88759) show up to 10% reduction in compilation time ([perf](https://perf.rust-lang.org/compare.html?start=c9db3e0fbc84d8409285698486375f080d361ef3&end=1f815a30705b7e96c149fc4fa88a98ca04e2deee)) and a 5MB (3%) reduction in the size of `librustc_driver.so`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comparison, I would like to see results where just std's destructors are nounwind/unwind-in-drop=abort. This is likely a worst-of-both-worlds situation where using std containers turns a drop unwind into an unconditional abort but stack types can still unwind from drop glue, but is well within std's QOI rights to decide that unwinds out of its drop glue is forbidden while still allowed by the language. (Whether this is acceptable QOI is a separate question and depends on the potential benefit.)

QOI
Quality Of Implementation

text/0000-panic-in-drop.md Show resolved Hide resolved
Comment on lines 206 to 249
// Codegen with -Z panic-in-drop=unwind (old behavior)
unsafe fn drop_in_place<T>(ptr: *mut T) {
// Call the Drop impl if there is one.
try {
<T as Drop>::drop(&mut *ptr);
} catch {
// If the Drop impl panics the keep trying to drop fields.
try {
drop_in_place(&mut (*ptr).field1);
drop_in_place(&mut (*ptr).field2);
drop_in_place(&mut (*ptr).field3);
} catch {
abort();
}
}

// Drop the first field.
try {
drop_in_place(&mut (*ptr).field1);
} catch {
// If dropping field1 panics the keep trying to drop the remaining fields.
try {
drop_in_place(&mut (*ptr).field2);
drop_in_place(&mut (*ptr).field3);
} catch {
abort();
}
}

// Drop the second field.
try {
drop_in_place(&mut (*ptr).field2);
} catch {
// If dropping field2 panics the keep trying to drop the remaining fields.
try {
drop_in_place(&mut (*ptr).field3);
} catch {
abort();
}
}

// Drop the third field.
drop_in_place(&mut (*ptr).field3);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this is illustrative rather than accurate, but this is also significantly worse than what is necessary and as such exaggerates the benefit of panic-in-drop=abort simplifying drop glue. A better representation of code the compiler could emit would be

unsafe fn drop_in_place<T>(ptr: *mut T) {
    let mut unwind = None;

    // Call the Drop impl if there is one.
    try {
        <T as Drop>::drop(&mut *ptr);
    } catch(payload) {
        std::panicking::panic_count::increase();
        match unwind {
            Some(_) => abort(),
            None => unwind = Some(payload),
        }
    }

    // Drop the first field.
    try {
        drop_in_place(&mut (*ptr).field1);
    } catch {
        std::panicking::panic_count::increase();
        match unwind {
            Some(_) => abort(),
            None => unwind = Some(payload),
        }
    }

    // Drop the second field.
    try {
        drop_in_place(&mut (*ptr).field2);
    } catch {
        std::panicking::panic_count::increase();
        match unwind {
            Some(_) => abort(),
            None => unwind = Some(payload),
        }
    }

    // Drop the third field.
    try {
        drop_in_place(&mut (*ptr).field3);
    } catch {
        std::panicking::panic_count::increase();
        match unwind {
            Some(_) => abort(),
            None => unwind = Some(payload),
        }
    }
    
    // Resume an in-progress unwind.
    if let Some(payload) = unwind {
        std::panicking::panic_count::decrease();
        resume_unwind(payload);
    }
}

but even this is more complicated than it need be, as those abort calls could be replaced by unreachable_unchecked, as hitting that case requires a double panic to unwind rather than immediately abort.

maybe better suggestion
Suggested change
// Codegen with -Z panic-in-drop=unwind (old behavior)
unsafe fn drop_in_place<T>(ptr: *mut T) {
// Call the Drop impl if there is one.
try {
<T as Drop>::drop(&mut *ptr);
} catch {
// If the Drop impl panics the keep trying to drop fields.
try {
drop_in_place(&mut (*ptr).field1);
drop_in_place(&mut (*ptr).field2);
drop_in_place(&mut (*ptr).field3);
} catch {
abort();
}
}
// Drop the first field.
try {
drop_in_place(&mut (*ptr).field1);
} catch {
// If dropping field1 panics the keep trying to drop the remaining fields.
try {
drop_in_place(&mut (*ptr).field2);
drop_in_place(&mut (*ptr).field3);
} catch {
abort();
}
}
// Drop the second field.
try {
drop_in_place(&mut (*ptr).field2);
} catch {
// If dropping field2 panics the keep trying to drop the remaining fields.
try {
drop_in_place(&mut (*ptr).field3);
} catch {
abort();
}
}
// Drop the third field.
drop_in_place(&mut (*ptr).field3);
}
// Codegen with -Z panic-in-drop=unwind (old behavior)
unsafe fn drop_in_place<T>(ptr: *mut T) {
let unwind;
// Call the Drop impl if there is one.
try {
<T as Drop>::drop(&mut *ptr);
} catch(payload) {
magic::assume!(!magic::initialized!(unwind));
std::panicking::panic_count::increase();
unwind = payload;
}
// Drop the first field.
try {
drop_in_place(&mut (*ptr).field1);
} catch {
magic::assume!(!magic::initialized!(unwind));
std::panicking::panic_count::increase();
unwind = payload;
}
// Drop the second field.
try {
drop_in_place(&mut (*ptr).field2);
} catch {
magic::assume!(!magic::initialized!(unwind));
std::panicking::panic_count::increase();
unwind = payload;
}
// Drop the third field.
try {
drop_in_place(&mut (*ptr).field3);
} catch {
magic::assume!(!magic::initialized!(unwind));
std::panicking::panic_count::increase();
unwind = payload;
}
// Resume an in-progress unwind.
if magic::initialized!(unwind) {
std::panicking::panic_count::decrease();
resume_unwind(unwind);
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the code that I showed is pretty much what is currently emitted by the compiler. Specifically:

  • The panic count is not checked, because it requires a TLS access which can be very slow.
  • Separate drop calls are used in the normal path and the unwind path because a different landing pad needs to be used when those calls unwind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because of extern "C-unwind" we can have C++ exceptions unwinding through Rust code which don't increment the panic count. This means that we can't rely on the panic count to guarantee that a drop within an unwind guard won't itself panic.

Copy link

@CAD97 CAD97 Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the compiler is emitting intrinsics::r#try directly. AIUI, this means that the drop glue can in fact just not touch the panic count, as the panic count is already set by the in-progress unwind. It is currently the case that a panic-in-panic will abort via the panic_count check in rust_panic_with_hook.

It is the case that calling drop_in_place while panicking cannot unwind due to a rust panic. The only way that the separate landing pad would landed on is if an external unwind is caught. (IIUC, the intent is for extern "C-unwind" to convert allowed unwinds into a rust panic. This must involve incrementing the panic count (and thus a place to abort on double panic), as catching will decrement the panic count.)

If you can show a case, where the drop glue second landing pad is landed on, I'll happily be proven wrong. But the normal case is in fact that the abort comes immediately from the panic!, not the drop glue catching it.

So here's my updated drop glue impl:
// Codegen with -Z panic-in-drop=unwind (necessary behavior, different from current)
unsafe fn drop_in_place<T>(ptr: *mut T) {
    let unwind;

    // Note that try {} catch {} here is the raw intrinsics::try.
    // The panic count is not touched, so remains set from the first panic.

    // Call the Drop impl if there is one.
    try {
        <T as Drop>::drop(&mut *ptr);
    } catch(payload) {
        unwind = payload;
        goto 'unwind_field1;
    }

    // Drop the first field.
    try {
        drop_in_place(&mut (*ptr).field1);
    } catch(payload) {
        unwind = payload;
        goto 'unwind_field2;
    }

    // Drop the second field.
    try {
        drop_in_place(&mut (*ptr).field2);
    } catch(payload) {
        unwind = payload;
        goto 'unwind_field3;
    }

    // Drop the third field.
    try {
        drop_in_place(&mut (*ptr).field3);
    } catch(payload) {
        unwind = payload;
        goto 'done;
    }
    
    goto 'done;
    
    try {
    'unwind_field1:
        drop_in_place(&mut (*ptr).field1);
    'unwind_field2:
        drop_in_place(&mut (*ptr).field1);
    'unwind_field3:
        drop_in_place(&mut (*ptr).field1);
    } catch {
        // A rust panic cannot hit this landing pad, because it would've aborted.
        __rust_foreign_exception(); // aborts
    }

'done:

    // Resume an in-progress unwind.
    if magic::initialized!(unwind) {
        __rust_resume_panic(unwind);
    }
}

@Amanieu
Copy link
Member Author

Amanieu commented Jul 6, 2022

I would like to echo that the RFC as currently written puts too much weight on the code size / compilation benefits of the change. These are a strong argument for making the option available, but distract from the correctness argument for making it mandatory. Presenting this in the main text which is concerned with justifying changing the semantics of existing code serves to distinctly weaken the justification, since the argued position is not necessary for a primary argued benefit.

I personally find the code size argument more convincing since it is backed by hard numbers which show the improvement. The exception safety argument is inherently subjective since it can just be countered by saying "well it's your fault for writing buggy unsafe code". It's not impossible to write exception-safe Rust code, just difficult.

Also, while the change would possibly simplify the correctness of unsafe code, it would compromise the correctness of the application at large, which may rely on properly handling Drop panics. This is a big issue, both because the application is much harder to audit due to its size and lack of compiler-enforced panic boundaries, and because the application is usually written by less experienced users, including total newbies, while unsafe code is relegated to the experts.

I find this argument quite strange. Does any existing code actually rely on panics for normal operation? In general, panics will only happen in the case of buggy code, when an unexpected error occurs. Additionally, the proportion of code executed by Drop impls is tiny compared to non-drop code: assuming an even distribution of potential unexpected panics in a codebase, they will almost always come from non-drop code.

There's additionally another intriguing middle ground which should be listed as an alternative: drop glue / drop_in_place can still unwind, and a field's drop glue unwinding continues to unwind and drop other fields (siblings and of parent containing types), but unwind is implicitly caught and turned into an abort only when the unwind exits from the drop glue stack into a calling function.

This is something that has been suggested a lot. If I understand correctly, what you are suggesting is that unwinding out of a drop aborts only for implicit drops but not explicit ones. The problem is: if you already know which drop calls are going to potentially panic then you could just convert that code to use a close method like I suggested in the example. I don't see much benefit in adding compiler support for something which can easily be done in library code.

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding.

The RAII drop glue calls drop_in_place, so making that function nounwind will affect both explicit and implicit drops.

I would much rather see (and am drafting) a proposal providing APIs for unsafe code to control unwinds more carefully; namely some combination of:

The problem with these is that none of them give the same code size & compilation time benefits as the approach proposed in this RFC. This is something that I feel quite strongly about: a 10% speedup in compilation time on a crate like syn is huge.

@carbotaniuman
Copy link

I personally find the code size argument more convincing since it is backed by hard numbers which show the improvement. The exception safety argument is inherently subjective since it can just be countered by saying "well it's your fault for writing buggy unsafe code". It's not impossible to write exception-safe Rust code, just difficult.

Given that the Rust standard library (ostensibly a codebase with a much higher level of rigor and review than most crates) has gotten this wrong on several occasions, I'm doubtful that most developers would be able to disarm this footgun, or even recognize that this exists.

@afetisov
Copy link

afetisov commented Jul 6, 2022

Does any existing code actually rely on panics for normal operation? In general, panics will only happen in the case of buggy code, when an unexpected error occurs.

Panics may be used to implement non-local exception-based control flow, even though it is clearly not an intended usage.

But to the point, while I doubt that panics could occur during normal operation of most non-buggy code, the correctness argument is about the graceful handling of bugs. Code may rely on being able to catch the panics for correctness. A long-running web server, or an embedded device, or a security-critical firmware, or an OS kernel absolutely should not panic. End-users may rely on being able to catch any possible panics with catch_unwind, and currently this includes single panics occurring in Drop glue. Note that the panic strategy is chosen by the end-user during compilation. If one sets -C panic=unwind, they can reasonably expect that they will be able to handle panics.

With the proposed changes, a relatively benign panic for some rare edge case will be escalated to a total crash, which can lead to DoS or worse.

Additionally, the proportion of code executed by Drop impls is tiny compared to non-drop code: assuming an even distribution of potential unexpected panics in a codebase, they will almost always come from non-drop code.

The argument about distribution of panics is quite weird, I don't know what to make of it. Since Drop must deal with disposing potentially fallible resources, I find it likely that Drop impls on average have more panics, but this is also a baseless assumption.

@CAD97
Copy link

CAD97 commented Jul 6, 2022

I personally find the code size argument more convincing since it is backed by hard numbers which show the improvement.

But the code size can be achieved by opt-in or opt-out behavior. It is not a justification for making panic-in-drop=abort the only option. It can serve as a supporting argument to making the change, but it cannot be the primary argument, because it does not justify making this breaking change to behavior.

I find this argument quite strange. Does any existing code actually rely on panics for normal operation?

Yes. For example, rust-analyzer uses unwinds for cancellation. Unwinding is the semantically correct way to handle cancellation, and is essentially what cancelling an async function and dropping the witness table looks like (just without thread::panicking()).

This is not unique to just rust-analyzer; this is a design choice of salsa.

It has generally been understood that while you SHOULD NOT require unwinding support in a library (and MUST support unwinding), you MAY rely on unwinding being available if you control the application compilation.

I agree that likely no code is deliberately relying on unwinding out of drop for application correctness, but it is the case that long-running programs intend to catch any task unwind, and turning unwinds that would have been recovered from into aborts is adding a new DoS vector to such servers.

This is something that has been suggested a lot. If I understand correctly, what you are suggesting is that unwinding out of a drop aborts only for implicit drops but not explicit ones. The problem is: if you already know which drop calls are going to potentially panic then you could just convert that code to use a close method like I suggested in the example. I don't see much benefit in adding compiler support for something which can easily be done in library code.

Then I can take this argument and apply it unchanged to making drop_in_place nounwind as well! You already know where things are going to potentially panic (every drop_in_place call) and can convert the code to use a close method.

Unless I'm sorely mistaken, the justification for making this a mandatory breaking change without opt-out is that it is too difficult to know which drop calls are going to potentially panic, and the proposed resolution is to make no drop calls panic. I fail to see how this alternative fails at the goal, as every point that can unwind is clearly marked. And if you do want the behavior where drop_in_place aborts unwinds as well, drop_in_place has a reminder on it to "remember that this can unwind from panicking drop glue! If this is undesirable, wrap the call in a drop_unwind or call drop_in_place_no_unwind instead."

I'd even be happy if we took the usual "safer by default" approach (as with copy/copy_nonoverlapping, sort/sort_unstable) and made drop_in_place abort unwinds but provided (and drop glue used) drop_in_place_allow_unwinds. The thesis I am defending is to provide safeguards but with the choice and tools to allow unwinds.

The problem with these is that none of them give the same code size & compilation time benefits as the approach proposed in this RFC. This is something that I feel quite strongly about: a 10% speedup in compilation time on a crate like syn is huge.

I will again also reiterate that the compile-time and code-size improvements are not an argument to make this mandatory. They are a very good argument to make it an option. They may even support making this the default. They would definitely be an argument against adding panic-in-drop=unwind if the existing behavior was panic-in-drop=abort.

Swift's lack of deinit throws is reasonable prior art to point to for not providing fallible destruction. However, Swift has a large advantage here: they don't have any unwinding. Your only choice is throws or abort. This means that you have a built-in guarantee that a Swift destructor doesn't contain an uncaught exception/unwind. In fact, in Swift, indexing an Array out of bounds aborts. (Most collections return an optional value from subscripts, but Array doesn't.) Basically any developer error which would panic! in Rust is an abort in Swift.

If anything, this is an argument in support of -Cpanic=abort becoming the default instead of -Cpanic=unwind. I would support such a change. (This may run into similar issues around encouraging crates which don't support unwinding, but tests continuing to be -Cpanic=unwind I think serves as a strong force against this.)

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding.

The RAII drop glue calls drop_in_place, so making that function nounwind will affect both explicit and implicit drops.

Yes, but I'm not arguing that point. It is a perfectly viable strategy for the unwind-to-abort shim to be added in the drop glue rather than in drop_in_place.


Given that the Rust standard library (ostensibly a codebase with a much higher level of rigor and review than most crates) has gotten this wrong on several occasions, I'm doubtful that most developers would be able to disarm this footgun, or even recognize that this exists.

Nobody (including std devs) really understood the issue of a caught unwind's drop unwinding until recently. (This discussion started almost immediately after it was discovered.) We can attack this with documentation on catch_unwind and any other APIs for managing unwinds.

Other than this specific case, std has generally had no issue with unwind safety, because it is something you so rarely have to think about that std basically only encounters it for BTreeMap and where catch_unwind unwinds.

@bjorn3
Copy link
Member

bjorn3 commented Jul 6, 2022

Code may rely on being able to catch the panics for correctness. A long-running web server, or an embedded device, or a security-critical firmware, or an OS kernel absolutely should not panic.

Also many TUI applications depend on unwinding to avoid getting stuck in raw mode when panicking. Very few register an atexit callback and signal handlers to exit raw mode on crashes.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Jul 6, 2022

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding. I would much rather see (and am drafting) a proposal providing APIs for unsafe code to control unwinds more carefully

I wrote up an outline for a related alternative in an IRLO thread a while back. The idea is to have a catch_unwind_v2<F, R>(...) -> UnwindResult<R>, where UnwindResult is #[must_use] and has many of the methods you listed, acting as a sort of by-value builder API. To summarize what I wrote there:

use std::{any::Any, panic::UnwindSafe};

pub fn catch_unwind_v2<F: FnOnce() -> R + UnwindSafe, R>(f: F) -> UnwindResult<R>;

#[must_use]
pub struct UnwindResult<T> { ... }

impl<T> UnwindResult<T> {
    // Unwinding functions
    pub fn into_result(self) -> Result<T, Box<dyn Any + Send>>;
    pub fn drop_err(self) -> Option<T>;
    pub fn resume_unwind(self) -> T;

    // Unwind-free functions
    pub fn inspect_err<F>(self, f: F) -> UnwindResult<T>
    where
        F: FnOnce(&(dyn Any + Send + 'static));

    pub fn inspect_err_mut<F>(mut self, f: F) -> UnwindResult<T>
    where
        F: FnOnce(&mut (dyn Any + Send + 'static));

    pub fn forget_err(self) -> Option<T>;
    pub fn unwrap_or_abort(self) -> T;
    pub fn try_drop_err(self) -> Result<T, UnwindResult<()>>;
    pub fn drop_err_or_forget(self) -> Option<T>;
    pub fn drop_err_or_abort(self) -> Option<T>;

    // Add `new()`, `map()`, etc. as desired
}

I've been doing some surveying of real-world catch_unwind usage (via GitHub Code Search) to evaluate this; I'll probably post my findings in a few days.

@carbotaniuman
Copy link

After a bit of research, I wanna say I don't particularly find the comparisons to C++ noexcept particularly compelling. In C++, the basic exception guarantee is as follows:

If the function throws an exception, the program is in a valid state. No resources are leaked, and all objects' invariants are intact.

Rust code provides neither of these guarantees, even in idiomatic code. (It is only better than having no exception guarantee that memory corruption iins not allowed). Leak amplification is used throughout the Rust stdlib (unlike C++).

In addition, unwinds do not require object invariants to be preserved, which is the entire reason why UnwindSafe is a thing, but issues like rust-lang/rust#89832 aren't really considered worth fixing. Observing broken object invariants in Rust is safe, for better or for worse.

Given that objects with throwing destructors were already forbidden in STL containers, the change in C++ likely impacted a lot less code than these proposed Rust changes. And I think given the difference in semantics for unwinding vs exceptions, this change is too big a hammer.

@Lokathor
Copy link
Contributor

Lokathor commented Jul 7, 2022

Yeah, UnwindSafe is kinda "fake" in the sense that it doesn't relate to unsafe at all.

@ijackson
Copy link

ijackson commented Jul 7, 2022

The RFC text (in the Downsides) ought IMO to discuss the "drop bomb" pattern, sometimes used to simulate linear types - or rather, to avoid shipping code which treats a particular type as nonlinear. E.g. rust-lang/rust#78802 (comment)

This is relevant here, because drop bombs precisely involve deliberately panicking inside drop, in circumstances where the author would probably have made very different design choices if they expected that such a panic would be an immediately irrecoverable abort.

@Gankra
Copy link
Contributor

Gankra commented Jul 7, 2022

Drop bombs are usually being used to cover up for things you would want to be compile-time errors, so it's not at all clear to me that you would care about whether the bomb unwinds or aborts -- if linear (true must-use) type lovers had their druthers, the program wouldn't have even been runnable!

text/0000-panic-in-drop.md Outdated Show resolved Hide resolved
@joshtriplett joshtriplett added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Jul 12, 2022
@joshtriplett
Copy link
Member

I think there's been a huge amount of focus on the optimization possibilities here, as opposed to the correctness argument. The optimization possibilities would justify creating an option, but not changing the default. The correctness issue justifies changing the default.

I think the correctness argument is the primary reason to do this; that it enables great optimizations is a bonus.

Handling objects that panic in their drop implementations is incredibly complex, and as noted, even the standard library has gotten it wrong. I think it's reasonable for us to evaluate whether we expect libraries to handle this, or whether we want to proscribe it entirely so that libraries don't have to handle it.

I am quite sympathetic to cases like long-running servers that want to catch panics and turn them into 500 Internal Server Error or similar, and keep running. However:

turning unwinds that would have been recovered from into aborts is adding a new DoS vector to such servers

Such servers already have multiple DoS vectors:

  • If you panic while holding a lock, you'll poison the lock, and everything touching that lock is unlikely to be able to recover.
  • If you encounter another panic while panicking, you'll already abort.

This proposal is not making all panics abort; I agree that that would be a huge change to the default. This proposal is making panics in drop abort. And it's already possible to make servers robust against that, by adding close() methods and similar.

@Lokathor
Copy link
Contributor

"you already can't recover from lock poisoning" was obviously a gap in the design, and closing that gap is already in progress: rust-lang/rust#96469
So we're actually already working to improve the DoS situation, and the RFC as written would worsen things.

There's three alternatives given, including "opt-in to unwinding", but there isn't an alternative given for "opt-out of unwinding". Since particular unsafe code is the code that can go wrong with a panic in drop (not all unsafe code does), why don't we put the burden on the particular instance of unwind sensitive unsafe code to opt in to having a drop that will abort if an unwind tries to escape from it? Call it #[about_on_unwind_escape] or whatever name. That seems the least disruptive to existing code, while allowing unsafe code the tools it needs to be correct.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 12, 2022

@Amanieu We discussed this in today's @rust-lang/lang meeting.

We felt like all of the arguments have been made at this point, and now we need to weigh the tradeoffs between possibilities:

  • Do we feel that we gain more robustness by support catch_unwind on panics from Drop, at the expense of the lower robustness of all the library code that doesn't handle this?
  • Or do we feel that we gain more robustness by eliminating this case as something libraries need to handle, at the expense of the lower robustness of code that wants to handle all possible panics including panics in Drop?

However, before we do that, there are a few changes we'd like to see:

  • The correctness argument should be at the forefront, as that's the primary justification for doing this. Performance is a bonus, and not a reason to change the default.
  • There should be some kind of hook that runs when aborting, to give code an opportunity to do cleanups. (The same hook should run on all panics with panic=abort.)

@TimNN
Copy link
Contributor

TimNN commented Jul 12, 2022

The correctness argument should be at the forefront, as that's the primary justification for doing this.

Once the correctness argument is at the forefront, I think it would be great for the RFC to also consider (or at least mention) alternatives that would aid the user in writing correct code, such as:

  1. a less error-prone catch_unwind API.
  2. a lint against any implicit method calls.
  3. some form of #[panics(never)] attribute.

I'm no expert on the compiler, but I feel like (2) should be reasonably easy to implement.

Implementing (3) in a useful manner would likely require a lot of additional design work1.

Footnotes

  1. Last time I though about this I ended up with multiple additional qualifiers like #[panics(never(verify_unstable)], #[panics(never(assume_unsafe)] or #[panics(never(type_dependent)]. Some could be safely combined (like #[panics(never(verify_unstable,type_dependent)]) but others should be linted against (like #[panics(never(assume_unsafe,type_dependent)] unless the annotated function is unsafe). And those qualifiers still don't cover all generic use cases.

@Amanieu
Copy link
Member Author

Amanieu commented Jul 12, 2022

There should be some kind of hook that runs when aborting, to give code an opportunity to do cleanups. (The same hook should run on all panics with panic=abort.)

This already exists: an unwind guard will call the panic handler with a PanicInfo that return false for can_unwind (#92988). std's panic handler will call the panic hook as normal but then abort instead of unwinding.

@nbdd0121
Copy link

nbdd0121 commented Apr 4, 2024

I personally really like the idea of implicit drops ("drop glue") not being allowed to unwind but explicit drops (drop/drop_in_place/ManuallyDrop::drop) being allowed.

I think this is an idea worth pursuing.

Implementation-wise it should just be as simple as not making drop glues nounwind, but we emit drop terminators with UnwindAction::Terminate (like what we currently do for things in cleanup landing pads) if panic-in-drop=abort.

We then only need to make sure core::mem::drop calls drop_in_place instead of using implicit drop, or use some other mechanisms to ensure that it generates UnwindAction::Resume.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Apr 4, 2024

I personally really like the idea of implicit drops ("drop glue") not being allowed to unwind but explicit drops (drop/drop_in_place/ManuallyDrop::drop) being allowed.

I don't know how feasible this is, and I'm sure there are some further questions to solve (like "which functions constitute explicit drop points"?), but I do think it makes for a sensible and straightforward mental model, and makes it easy to choose the correct thing when e.g. writing unsafe or otherwise drop-sensitive code.

While I think this isn't a bad idea at its core, I'm wary of such a rule being applied everywhere, since then every time you see an explicit drop mechanism, you'd have to judge whether the author was specifically thinking about unwinding, or whether the explicit drop is just being used for readability or performance. (E.g., drop()ping a big Vec the moment you no longer need it.) Instead, I'd prefer that this be something that can be opted into for particular spans of code, so that readers know immediately that dropping is very sensitive for that code. Our goal should be making special behavior more visible, not less visible.

@LunarLambda
Copy link

This is the kind of thing where, similar to unsafe blocks, an explicit drop should almost always have an accompanying comment stating intent. Though I would also say that for example, safe code almost never actually cares about the panic behaviour of drop (or panic behaviour in general), so it's also fairly easy to tell from context

@oxalica
Copy link

oxalica commented Apr 4, 2024

I personally really like the idea of implicit drops ("drop glue") not being allowed to unwind but explicit drops (drop/drop_in_place/ManuallyDrop::drop) being allowed.

I don't like this exceptional behavior, giving one main reason to use explicit drop/ManuallyDrop::drop is to control the drop order where the compiler cannot do it in the expected way otherwise. Eg. Early drop of Mutex where enclosing a block is not possible (loop related, or etc), pin the drop order of unsafe struct fields to be resistant from refactoring. Many of this use cases are not necessarily related to the panic handling, or on the contrary, they are expected/preferred to be panic-free.

lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Apr 7, 2024
catch_panic: warn about panicking payload drop

Warns about the footgun in rust-lang/rust#86027.

Will be unnecessary if panics escaping from drop leads to abort (rust-lang/rfcs#3288). But until that is enforced everywhere, let's warn users about this.
@ijackson
Copy link

(xref to some discussion of drop bombs in Tor Project gitlab: https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2091#note_3021454)

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Apr 24, 2024

I don't understand the motivation for some explicit drops being allowed to panic but not others. It seems like these kinds of scenarios would be better encoded as a fn finish(self) that panics, rather than (ab)using Drop. So I'd like to understand what benefit we get specifically from leveraging Drop, given that the drop handler must be explicitly invoked. My view on Drop is that it is meant to be the cleanup code that runs implicitly when the value goes out of scope.

Put another way, I really think we should be striving for a single set of rules that apply to Drop impls, no matter how they wind up being invoked. If we want differing sets of rules, perhaps we want another trait. So can we compare the value from distinguishing explicit drop vs having a finish method vs having a another trait (say Finish)?

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Apr 25, 2024

Put another way, I really think we should be striving for a single set of rules that apply to Drop impls, no matter how they wind up being invoked. If we want differing sets of rules, perhaps we want another trait. So can we compare the value from distinguishing explicit drop vs having a finish method vs having a another trait (say Finish)?

The problem I see with that is that every generic container would need to forward Finish, since otherwise it wouldn't know to finish() all its elements. And extra opt-in traits tend not to get off the ground: see UnwindSafe for one failure mode. The main value in Drop is that it automatically forwards to all fields for safe containers, so that only unsafe containers need to worry about manual forwarding. And even if Finish is automatically implemented for safe types, unsafe containers would now need both a drop() and a finish() implementation to support both of them.

And regardless, I don't see how all these added workarounds would be superior to just opting out at the call sites that are sensitive to unwinding. The whole point of unwinding is that it should soundly pass through any safe function with well-behaved local values (disrupting logical invariants but not causing UB): catching an unwind and continuing should be the difficult part, and being unwound through should be harmless except for a small minority of code. Of course, that minority is also quite important to get correct, hence my recommendation to create better call-site tooling.

Also, I'm still quite curious, what are those "many bugs" from unwind-on-drop that you were referring to?

@CAD97
Copy link

CAD97 commented Apr 25, 2024

A primary difficulty of using a fn finish(self) is that unless you play tricks with ManuallyDrop, the Drop code is still run after the finalization work done in the function and it's impossible to take any fields by value in order to .finish() them. We also lack any kind of defer or try-with-resources to help mitigate the need to remember to call into finalization. How often do you remember to .flush() an IO handle before dropping it?

I do think a native defer construct that was allowed to panic when run at end of scope would satisfy the use cases for deliberately panicking out of drop glue, leaving only the case of resiliency, which is already somewhat hobbled by double panics causing an (eventual) abort.

A notable failure case of opt-in no-drop-unwinds is that it's an uncontrolled property whether a type gets dropped in an unwind-friendly context or an unwind-abort context (and not one that can be tested for like thread::panicking), but to add to that, most type instances get stored in some sort of collection which is going to be unwind-during-cleanup sensitive, leading to that panics during cleanup still cause an abort more times than not, just maybe a little later.

@kpreid
Copy link
Contributor

kpreid commented Apr 25, 2024

A primary difficulty of using a fn finish(self) is that unless you play tricks with ManuallyDrop, the Drop code is still run after the finalization work done in the function and it's impossible to take any fields by value in order to .finish() them.

This is a gap in the language that could be filled in. As you observe, it's already possible but unsafe to do. It could be made safe, by providing a modifier on a struct or enum pattern (possibly restricted to use in the defining crate?) which:

  • bypasses the E0509 restriction, and
  • must move out all fields, except those whose types are Copy and thus have no destructors of their own, so that the field values are unambiguously now owned by the function rather than the matched value.

Just like a struct literal expression introduces a droppable struct value, by taking ownership of its ingredients as a primitive operation, this “de-construction” operation does the opposite.

Even if no changes are made to Rust's general finalization story, de-construction would be useful for all operations which consume a value and want to do something other than its Drop behavior — which currently has to be handled with either unsafe code or making fields Optional when they semantically aren't.

(To be clear, I'm not arguing in favor of adding finish() — I just had this idea sitting around for improving consuming functions in general. …I should write this up as an actual separate proposal, shouldn't I.)

@programmerjake
Copy link
Member

well, since Drop is super weird and not really a normal trait, rust could just allow you to write:

impl Drop for MyType {
    fn drop(self) { // takes self by value, requires MyType: Unpin
        let Self { field_1, field_2 } = self;
        // ...
    }
}

@kornelski
Copy link
Contributor

Would it help to have some standard library type that panics in its own Drop (so all drop impls could catch panics and abort, except this one), but in a way that can be controlled externally?

struct Transaction {
    drop_bomb: Option<std::panic::Bomb>,
}

impl Transaction {
  fn commit(self) {
      self.drop_bomb.take().map(std::panic::Bomb::defuse);
  }
}

and/or have:

std::panic::no_unwind_on_drop(|| {
    // write delicate code here without worrying about unwinding
});

@Jules-Bertholet
Copy link
Contributor

  • bypasses the E0509 restriction

My RFC #3466 would provide for this (though it would not address the rest of your post; the user would need to remember to perform the move).

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Apr 26, 2024

A notable failure case of opt-in no-drop-unwinds is that it's an uncontrolled property whether a type gets dropped in an unwind-friendly context or an unwind-abort context (and not one that can be tested for like thread::panicking), but to add to that, most type instances get stored in some sort of collection which is going to be unwind-during-cleanup sensitive, leading to that panics during cleanup still cause an abort more times than not, just maybe a little later.

First off, as some others have said regarding this, not knowing if you're in an unwind-friendly context is already the status quo: your crate might have been compiled with -C panic=abort, or you might be in a #![no_std] environment with no unwinding, so there's never a guarantee unless you control the build process. Hence why a crate-wide optional abort-on-panicking-drop is not considered as immediately breaking. Meanwhile, mandating abort-on-panicking-drop would similarly introduce many unwind-abort contexts that cannot be tested for.

Also, an opt-in context would be mostly equivalent to an aborting drop bomb (except possibly more eager), which is already a widespread approach for unwind-sensitive code that doesn't want to carefully forward panics, or is physically unable to (e.g., FFI callbacks). If anything, this would make it easier to add a panicking()-like feature to test for unwind-abort contexts, since it would be more transparent than drop bombs.

Second, I don't see unsafe-collection destructors as the kind of "unwind-during-cleanup sensitive" code that would want or need an abort bomb. Most collections don't have a crucial invariant to be maintained during drop: if a panic cuts them off, they just leak some of their elements and/or allocations. After all, the collection is totally inaccessible anyway once it's dropped, so any broken invariant would have to lie somewhere outside, which is far from common. I envision such an opt-in mechanism as specifically being for those cases where unsoundness or extreme incorrectness might occur from an unwind. (E.g., perhaps a cryptographic zero-on-drop type might be particularly concerned about not skipping the remainder of the destructor.)

I suppose it could indeed be a failure mode if authors become afraid and start using it unnecessarily. Still, I think that the benefits of first-class explicit unwind control are worth this cost, especially given how unwieldy the current alternative (drop guards) is; we'd just have to take care in how we document it.

RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
catch_panic: warn about panicking payload drop

Warns about the footgun in rust-lang/rust#86027.

Will be unnecessary if panics escaping from drop leads to abort (rust-lang/rfcs#3288). But until that is enforced everywhere, let's warn users about this.
@CAD97
Copy link

CAD97 commented Apr 27, 2024

Second, I don't see unsafe-collection destructors as the kind of "unwind-during-cleanup sensitive" code that would want or need an abort bomb.

On the other hand one of the motivators for aborting upon unwind out of drop is preventing this kind of partial destruction and simplifying drop glue by not ever needing to handle unwinds during normal cleanup.

Also, I'm still quite curious, what are those "many bugs" from unwind-on-drop that you were referring to?

A subtle one I wrote somewhat recently (in proof of concept code) was assuming that dropping the sending half of a channel can't drop any queued messages; that dropping the receiving half drops the queue and sending fails after that sync point. For the usual impls, this is true unless dropping a message panics, in which case any remaining messages in the queue resume being dropped when the final sender is dropped.

Compare dropping VecDeque which does essentially let (front, back) = self.as_mut_slices(); defer { drop_in_place(back); } drop_in_place(front); to be fully unwind correct and doesn't just consecutively drop the two regions. Writing unwind correct drop implementations is difficult, even though safely forgetting destructors is generally simple enough (as long as you aren't projecting pinning invariants).

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Apr 28, 2024

On the other hand one of the motivators for aborting upon unwind out of drop is preventing this kind of partial destruction and simplifying drop glue by not ever needing to handle unwinds during normal cleanup.

Sure, I'm all for making abort-on-unwinding-drop something that crates can opt into, given the measured perf benefits; it's just making it mandatory that I have big issues with. The basic principle is, random crate authors have no obligation to make their panic-on-drop behavior sane (only to make it sound), but I should have the ability to handle unwinds as robustly as I want, and treat Drop as transparently as I want, within the privacy of my own codebase with my own user-defined types.

(E.g., not too long ago, in some of my own experimental code, I messed around with making trait objects that execute all their behavior on drop. That way, the vtable could be made as small as possible for what the objects were required to do. Or, for a more standard example, a codebase might want to use unwinds to implement robust thread cancellation for a particular thread, by resume_unwind()ing at certain checkpoints.)

Regarding partial destruction, I don't see the status quo of 'best-effort' cleanup as particularly problematic: it just means that if you do have anything really important to clean up on panic (open sockets, perhaps?), you might want to segregate it into its own collection, where each element is known to be well-behaved on drop.

A subtle one I wrote somewhat recently (in proof of concept code) was assuming that dropping the sending half of a channel can't drop any queued messages; that dropping the receiving half drops the queue and sending fails after that sync point. For the usual impls, this is true unless dropping a message panics, in which case any remaining messages in the queue resume being dropped when the final sender is dropped.

Thank you, that is a good example. I suppose channels would be an archetypical case of dropping a container possibly causing external effects.

More generally, in the status quo, when implementing a data structure with shared ownership (e.g., Arc), you have to make sure to make a contained value inaccessible before dropping it, to avoid any use-after-drop issues from panics. Thankfully, most anti-double-drop protections already imply this. But where it doesn't, I still think that an opt-in abort at the call site would be a sufficient solution for these kinds of data structures.

Compare dropping VecDeque which does essentially let (front, back) = self.as_mut_slices(); defer { drop_in_place(back); } drop_in_place(front); to be fully unwind correct and doesn't just consecutively drop the two regions. Writing unwind correct drop implementations is difficult, even though safely forgetting destructors is generally simple enough

Yeah, writing fully unwind-correct code is pretty tedious currently. But I don't think it necessarily has to remain that way forever. There will always be lots of tricky unwind-sensitive code outside of destructors, so the more we can add built-in functionality to help with those, the less tedious destructors in particular can become.

In the case of VecDeque, something akin to a defer or finally block would do the trick. But obviously, that would become somewhat verbose for more than two elements, and a variable-size list is another matter entirely. I'm not really sure in general how easily we could tell LLVM, "For cleanup, just do the rest of the function (or rest of the loop, etc.) normally, then continue unwinding afterward." Though the idea does make me think of something silly like an Iterator::for_each_guarded() function, that just does the equivalent of

fn for_each_guarded<F: FnMut(Self::Item)>(mut self, mut f: F) where Self: Sized {
    struct Guard<'a, I: Iterator, F: FnMut(I::Item)>(&'a mut I, &'a mut F);
    impl<I: Iterator, F: FnMut(I::Item)> Drop for Guard<'_, I, F> { fn drop(&mut self) {
        self.0.for_each(|item| (self.1)(item));
    } }
    let guard = Guard(&mut self, &mut f);
    guard.0.for_each(|item| (guard.1)(item));
    core::mem::forget(guard);
}

so you could then write

let (front, back) = self.as_mut_slices();
[front, back].into_iter().for_each_guarded(|slice| drop_in_place(slice));

It would be perfectly composable for tree-like data structures. But I don't know how well it would optimize.

(as long as you aren't projecting pinning invariants).

I suppose you're referring to certain forms of stack pinning in particular, where the data structure holds an owned value in borrowed storage? For heap pinning, the invariant would be upheld by just leaking the heap storage after the value's destructor panics.

@withoutboats
Copy link
Contributor

(NOT A CONTRIBUTION)

I do not think this change would be an improvement to the language, chiefly for the same reasons as other commenters in this thread: if a buggy destructor panics, I want to be able to isolate it without bringing down my entire application.

More importantly, as @nikomatsakis notes in his proposal to merge this RFC, this is a breaking change to your existing user base and a violation of the Rust project's stability guarantees. For this reason alone this RFC should be closed & I don't understand how this can be brushed off so lightly. Is Rust a reliable tool that I can upgrade without fear, or does it make breaking changes in point releases?

The primary motivation for this RFC seems to be the issue of panicking drops breaking unsafe code that is not exception safe. I agree that this is a difficult thing to work around, however I would point out that drops (being silently inserted code) can cause bugs in unsafe code that hasn't considered them even if they don't panic. I wrote about this years and years ago when I wrote two bugs in ringbahn that both were the result of destructors being inserted that I didn't consider: https://without.boats/blog/two-memory-bugs-from-ringbahn/

For this reason I think the project should pursue @LegionMammal978's proposal from 2022:

Thus, I believe that since this is a silent change that will harm some current use cases, we should really show that the benefit of this change is greater than the cost, and that there are neither more beneficial nor less costly alternatives. I argue that such an alternative exists: keep panic-in-drop=unwind as the default, and provide better tools for writing drop-sensitive unsafe code.

A totally non-invasive change would be to allow users to warn when any destructors are inserted in a code block, so they instead use explicit calls to drop. This will force them to be aware of any drop calls which could panic, just like any other generic function calls. Other additional beneficial APIs would be do ... final blocks for users to write ad hoc code that will run while unwinding. If there are useful APIs that want to take generic types but cannot be made unwind-safe, marker traits for types which don't have a destructor that could panic could be explored. All of these are better solutions than a blunt force backwards incompatible change.

@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented May 3, 2024

If a buggy destructor panics, I want to be able to isolate it without bringing down my entire application.

I generally agree with the "it's a breaking change" argument, but there's an important caveat to the above statement: even today, recovering from a buggy destructor is not guaranteed, because double-panicking (due to a second buggy destructor) leads to an immediate abort.

@withoutboats
Copy link
Contributor

withoutboats commented May 3, 2024

(NOT A CONTRIBUTION) @Jules-Bertholet yes that's true in an absolute sense, but its about the degree of risk.

In a correctly implemented program, zero panics occur. But we need to tolerate the risk of programmer error. Using unwinding to isolate the programmer error is an effect way to reduce the degree of risk in practice so that it is less likely to escalate into an urgent system failure.

With the current system, the risk that unwinding fails and becomes a process abort (increasing the chance of escalation) is the multiple of the probability of a panic, the probability of panic in a destructor, and the inverse of their correlation (because these two probabilities are likely somewhat correlated). By eliminating one of the multiples, this increases the risk of process abortion and therefore the risk of escalating failure.

Instead of reducing the degree to which unwinding is available to mitigate programmer error, the project should find ways to make correct unsafe code easier to write that doesn't require eliminating this mitigation.

@ijackson
Copy link

ijackson commented May 3, 2024

... this is a breaking change to your existing user base and a violation of the Rust project's stability guarantees. For this reason alone this RFC should be closed & I don't understand how this can be brushed off so lightly.

Thank you for your intervention. I agree.

drops (being silently inserted code) can cause bugs in unsafe code that hasn't considered them even if they don't panic.

Quite so,.

A totally non-invasive change would be to allow users to warn when any destructors are inserted in a code block, so they instead use explicit calls to drop.

This is of course an excellent idea.

This kind of alternative is IMO not at all properly discussed in the RFC. There is one short sentence about it, nearly at the bottom of the list of alternatives.

The Alternatives section ought to explain, for each rejected alternative, why that option has been rejected.

In this case, the RFC does not give any reason why a lint wouldn't be a superior answer. This is an obvious red flag for any RFC, let alone one which is such a significant departure from previously advertised guarantees, and which has attracted such substantial opposition.

The proper response to criticisms here in the discussion thread, is to refute them in the RFC text (if they can be refuted, which I suspect isn't the case here). Instead, , the discussion here has become an unmanageable behemoth, and many good points made go unanswered.

TBH I have found it quite alarming, how ready the project seems to be to adopt a proposal which is so controversial, on the basis of a document which has such an obvious deficiency. The RFC process is failing here. I don't think this is because it's a bad process; rather, we don't seem to be applying it rigourously.

@RalfJung
Copy link
Member

RalfJung commented May 3, 2024

The Alternatives section ought to explain, for each rejected alternative, why that option has been rejected.

FWIW, I agree. I don't have a strong opinion either way for this RFC: from an unsafe code perspective, it'd be very nice to have to worry about fewer unwinding points -- but it's not clear that by "just" removing drops from the picture, we are making a significant enough dent here; there's still an unwinding point at each function call after all. But there are some interesting alternatives -- not just a lint; thanks to all the C-unwind work (that happened after this PR was initiated), the #[rustc_nounwind] attribute is now a safe attribute that turns all unwinding out of a function into aborts. So another possibility would be to let unsafe code opt-in not worry about unwinding by stabilizing some form of this attribute. The attribute could even work on a scope smaller than an entire function (that should be fairly easy to implement with the same approach that backs rustc_nounwind): we could have #[no_unwind] { ... } blocks.

As a t-lang advisor, I'd like to register this as a formal concern: to better describe the possible alternatives and explain why they are worse than the proposed solution.
@rfcbot concern alternatives
(AFAIK rfcbot doesn't understand the t-lang advisor role, so could a t-lang team member please tell the bot about this?)

@withoutboats
Copy link
Contributor

(NOT A CONTRIBUTION)

I like Ralf's idea as a way of formalizing what libraries like rayon apparently already do: upgrading any panic to an abort whenever you enter their routines. But I think it's important that libraries document if they do this; given the choice between a library which handles unwinding correctly and one which doesn't, I would be more likely to choose the unwinding-friendly library, and I don't think we should abandon that as the goal for unsafe code.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 9, 2024

@rfcbot concern describe alternatives

Ralf writes:

As a t-lang advisor, I'd like to register this as a formal concern: to better describe the possible alternatives and explain why they are worse than the proposed solution.

(AFAIK rfcbot doesn't understand the t-lang advisor role, so could a t-lang team member please tell the bot about this?)

@CAD97
Copy link

CAD97 commented Jun 4, 2024

As an interesting note, C++ denied std::scope_guard because of the C++ STL rule that all destructors must be noexcept. I've firmly settled into a position that I don't mind drop glue being nounwind, but I also hold that we shouldn't make that change until there is some way to run code during scope cleanup that is still allowed to unwind (e.g. a defer).

@the8472
Copy link
Member

the8472 commented Oct 22, 2024

Something Nadrieril came up with during unsafe fields rfc discussion:

How about unsafe locals (unsafe let) that cause an abort if they're still alive at the end of a function (be it by return or unwind). Which means one has to explicitly drop or move them.

@Nadrieril
Copy link
Member

Nadrieril commented Oct 22, 2024

(More precisely, my intent was for the borrow-checker to prevent automatic drops out of unsafe let locals by:

  • forcing us to drop or move out of each unsafe let local explicitly;
  • turning all unwinds into aborts while the locals are live.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.