Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coherence can be bypassed by an indirect impl for a trait object #57893

Open
arielb1 opened this issue Jan 25, 2019 · 61 comments
Open

Coherence can be bypassed by an indirect impl for a trait object #57893

arielb1 opened this issue Jan 25, 2019 · 61 comments
Assignees
Labels
A-trait-objects Area: trait objects, vtable layout A-trait-system Area: Trait system C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority S-bug-has-test Status: This bug is tracked inside the repo by a `known-bug` test. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-types Relevant to the types team, which will review and decide on the PR/issue.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Jan 25, 2019

Comments

The check for manual impl Object for Object only makes sure there is no direct impl Object for dyn Object - it does not consider such indirect impls. Therefore, you can write a blanket impl<T: ?Sized> Object for T that conflicts with the builtin impl Object for dyn Object.

Reproducer

edit: minimal reproducer from #57893 (comment)

trait Object<U> {
    type Output;
}

impl<T: ?Sized, U> Object<U> for T {
    type Output = U;
}

fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    x
}

fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
}

I had some difficulties with getting the standard "incoherence ICE" reproducer, because the object candidate supersedes the impl candidate in selection. So here's a "transmute_lifetime" reproducer.

trait Object {
    type Output;
}

trait Marker<'b> {}
impl<'b> Marker<'b> for dyn Object<Output=&'b u64> {}

impl<'b, T: ?Sized + Marker<'b>> Object for T {
    type Output = &'static u64;
}

fn foo<'a, 'b, T: Marker<'b> + ?Sized>(x: <T as Object>::Output) -> &'a u64 {
    x
}

fn transmute_lifetime<'a, 'b>(x: &'a u64) -> &'b u64 {
    foo::<dyn Object<Output=&'a u64>>(x)
}

// And yes this is a genuine `transmute_lifetime`!
fn get_dangling<'a>() -> &'a u64 {
    let x = 0;
    transmute_lifetime(&x)
}

fn main() {
    let r = get_dangling();
    println!("{}", r);
}

Duplicates, see also

@arielb1 arielb1 added A-trait-system Area: Trait system A-DSTs Area: Dynamically-sized types (DSTs) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Jan 25, 2019
@scalexm
Copy link
Member

scalexm commented Jan 31, 2019

@arielb1 Interesting. I guess we would need a deeper check, something along the lines of:

impl<T> Object for SomeType<T> where WC {
/* ... */
}
// If the following goal has a non-empty set of solutions, reject the impl.
exists<T> {
    exists<U> {
        Unify(dyn Object<Output = U>, SomeType<T>),
        dyn Object<Output = U>: WC
    }
}

cc @nikomatsakis

@arielb1
Copy link
Contributor Author

arielb1 commented Feb 2, 2019

Sure.

What makes this non-trivial is auto-traits (dyn Object + Send is a distinct wrt. unification from dyn Object), and to some extent binders (unless you ignore them in unification). i.e., supposing the impl you have is

impl<T> Object<'a> for SomeType<T> where WC

Then the "ideal" lowering would be something of this format:

// If the following goal has a non-empty set of solutions, reject the impl.
exists<T: type, 'a('x): lifetime -> lifetime, U('x): lifetime -> type, AT: list of auto-traits> [
    Unify(dyn for<'s> Object<'a('s), Output = U('s)> + AT, SomeType<T>),
    dyn for<'s> Object<'a('s), Output = U('s)> + AT: WC
}

If you ignore binders in unification (i.e., you consider for<'a> Object<'a, Output=&'a T> to be the same as Object<'x, Output=&'y T> in coherence), then you don't have to deal with the HRTB problem, but you still have to deal with auto-traits:

// If the following goal has a non-empty set of solutions, reject the impl.
exists<T: type, 'a: lifetime, U: type, AT: list of auto-traits> [
    Unify(dyn Object<'a, Output = U> + AT, SomeType<T>),
    dyn Object<'a, Output = U> + AT: WC
}

While I don't think this is insurmountable, it is fairly annoying and places some restrictions on how you encode auto-traits.

@Aaron1011
Copy link
Member

I'm interested in working on this.

@Aaron1011
Copy link
Member

It turns out that only one trait is necessary to reproduce this: (playground)

trait Object {
    type Output;
}

impl<T: ?Sized> Object for T {
    type Output = &'static u64;
}

fn foo<'a, T: ?Sized>(x: <T as Object>::Output) -> &'a u64 {
    x
}

fn transmute_lifetime<'a, 'b>(x: &'a u64) -> &'b u64 {
    foo::<dyn Object<Output=&'a u64>>(x)
}

// And yes this is a genuine `transmute_lifetime`!
fn get_dangling<'a>() -> &'a u64 {
    let x = 0;
    transmute_lifetime(&x)
}

fn main() {
    let r = get_dangling();
    println!("{}", r);
}

This seems quite bad, as simply writing a blanket impl is enough to expose the issue.

@jonas-schievink jonas-schievink added the C-bug Category: This is a bug. label Jun 9, 2019
@Aaron1011
Copy link
Member

One way to fix this issue would be the following:

If a trait has any associated items, and a blanket impl exists for it, that trait cannot be object-safe.

This is pretty extreme - however, the only other solution I can think of would be to deny all blanket impls of a trait with associated items - which seems even worse, given that the trait might not have even been object-safe to begin with.

@Centril
Copy link
Contributor

Centril commented Jun 9, 2019

Unfortunately, it gets worse. Here's an implementation of mem::transmute in safe Rust:

trait Object<U> {
    type Output;
}

impl<T: ?Sized, U> Object<U> for T {
    type Output = U;
}

fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    x
}

fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
}

I think blame should be assigned to not normalizing <T as Object<U>>::Output to U in fn foo. If that happened, foo::<dyn Object<U, Output = T>, U>(x) would not type-check.

I checked the code above with godbolt and the problem has existed since Rust 1.0.0.

@jonas-schievink
Copy link
Contributor

The first comments in the thread seem to indicate that the problem is instead with the blanket impl itself being accepted by the compiler.

If I understood correctly, @Centril's impl above can be applied to dyn Object<U, Output=T> for any U and T, and defines Output=U. However, a trait object like dyn Object<U, Output=T> already provides an implicit impl with the same input types, but a different output type, which is incoherent.

@jonas-schievink jonas-schievink added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 9, 2019
@jonas-schievink
Copy link
Contributor

Seems like a good idea to discuss in language and compiler team, nominating.

@Centril
Copy link
Contributor

Centril commented Jun 9, 2019

Notably, in my reproducer above, if you change fn foo to:

fn foo<T: ?Sized + Object<U>, U>(x: <T as Object<U>>::Output) -> U {
    x
}

then you will get:


error[E0308]: mismatched types
  --> src/lib.rs:10:5
   |
9  | fn foo<T: ?Sized + Object<U>, U>(x: <T as Object<U>>::Output) -> U {
   |                                                                  - expected `U` because of return type
10 |     x
   |     ^ expected type parameter, found associated type
   |
   = note: expected type `U`
              found type `<T as Object<U>>::Output`

@jonas-schievink
Copy link
Contributor

Is that not just due to where-clauses having precedence over impls? AFAIK trait selection will prefer the Object<U> impl in the bounds, which doesn't have an assoc. type specified, so the compiler can't normalize it.

@Aaron1011
Copy link
Member

I made a first attempt at a fix here: https://github.com/Aaron1011/rust/tree/fix/trait-obj-coherence

My idea was to extend the obligations generated by ty::wf::obligations to include projection predicates of the form:

<dyn MyTrait<type Assoc=T> as MyTrait>::Assoc = T.

This is combeind with an extra flag in SelectionContext to force impl candidates to be chosen over object candidates. In the case of an incoherent blanket impl (e.g. any of the reproduces in this thread), <dyn MyTrait<type Assoc=T> as MyTrait>::Assoc should project to the type specified in the blanket impl. This will fail to unify with T, correctly causing a compilation error.

Unfortunately, I wasn't able to get this working, due to how the well-formed predicate generation is integrated with the rest of the compiler. I would need to significantly refactor FulfillmentContext, SelectionContext, and/or Predicatein order to keep track of the extra flag that needs to be passed toSelectionContext`.

I'm hoping that someone can come up with a better approach. However, any solution will need to deal with the fact that SelectionContext masks attempts to detect the incoherence by discarding the impl candidate in favor of the object candidate.

@RalfJung
Copy link
Member

This is interesting! The fact that foo in @Centril's example even gets accepted shows that we aggressively exploit coherence during type-checking: because we saw an impl that uses a certain associated type, we type-check assuming that that is the impl that got used here. Unfortunately, as the example here shows, that is in conflict with the "implicit impl" that we get for trait objects.

I wouldn't be sad if we would fix this by being less aggressive about exploiting coherence during type-checking -- that makes analysis much more complicated, and this example shows why. But there's likely already tons of code out there that exploits this. Another fix is to refuse to create trait objects that "contradict" the kind of coherence knowledge that the type checker might exploit, but that seems extremely fragile.

@Aaron1011 your plan seems to be to encode in the type of foo the fact that we exploited coherence? However this "exploitation" happens in the body of the function, so what exactly gets exploited cannot be known yet when computing the well-formedness predicate for foo. Thus I guess what you want to / have to do is to aggressively look for all things one might be able to exploit up-front, add them all to the well-formedness obligations, and then during type-checking the boy look only at the obligations and not exploit coherence any more (as doing so would introduce scary bugs if the two systems for getting such type equalities out of coherence would ever not agree).

Once again, exploiting assumptions implicitly instead of inferring them to an explicit form proves to be an issue. This reminds me that we still implicitly exploit WF everywhere instead of turning that into explicit assumptions...

@pnkfelix
Copy link
Member

triage: P-high due to unsoundness. Leaving nominated in hope of discussing today. Not assigning to anyone yet.

@pnkfelix pnkfelix added the P-high High priority label Jun 13, 2019
@pnkfelix
Copy link
Member

pnkfelix commented Jun 13, 2019

assigning to @nikomatsakis with expectation that they will delegate. (and removing nomination label)

@Centril
Copy link
Contributor

Centril commented Jun 13, 2019

Still nominated for t-lang.

@arielb1
Copy link
Contributor Author

arielb1 commented Jun 26, 2019

@RalfJung

I am not sure that giving up on coherence is the right choice here. I think it is too easy to split the code into functions, such that no one function sees the coherence violation:

struct Data<T: ?Sized, U> where T: Object<U> {
    data: <T as Object<U>>::Output
}

trait Object<U> {
    type Output;
}

trait Mark {
    type Output;
}

impl<T: ?Sized, U: Mark<Output=V>, V> Object<U> for T {
    type Output = V;
}

fn iso_1<T, U>(data: T) -> Data<dyn Object<U, Output=T>, U> {
    // in any coherence-less solution, this shouldn't "see" the
    // blanket impl, as it might not apply (e.g., `Local` might
    // be in a third-party crate).
    Data { data }
}

fn iso_2<X: ?Sized, U, V>(data: Data<X, U>) -> V
    where U: Mark<Output=V>
{
    // similarly, this shouldn't "see" the trait-object impl.
    data.data
}

fn transmute_m<T, U, V>(data: T) -> V
    where U: Mark<Output=V>
{
    // This function *also* shouldn't see the blanket impl - `Local`
    // might be in a third-party crate.
    iso_2::<dyn Object<U, Output=T>, U, V>(
        iso_1::<T, U>(data)
    )
}

// These 3 functions could be in a child crate

struct Local<T>(T);

impl<T> Mark for Local<T> {
    type Output = T;
}

fn transmute<T, U>(x: T) -> U {
    // and this function shouldn't see *anything* that looks like a
    // problematic associated type.
    transmute_m::<_, Local<_>, _>(x)
}

@arielb1
Copy link
Contributor Author

arielb1 commented Jun 26, 2019

I think that conditioning object safety on an object type being coherent is a reasonable-enough way out of this.

If we want to be tricky, we might put the condition "directly" on the impl - i.e., have ObjectCandidate not apply if an ImplCandidate might "theoretically" apply to any substitution of this type. This would however require doing coherence-style negative reasoning during selection, which would be ugly.

@RalfJung
Copy link
Member

@arielb1 your iso_2 exploits coherence, so "giving up on exploiting coherence" would not let that function type-check.

@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Nov 18, 2023

  • Any is useful only in trait-object form (you can just use TypeId::of otherwise).
  • The only way to obtain a dyn Any trait object is by coercing from a Sized type.

Given these facts, might it be justifiable, as a soundness fix, to add a Sized bound to T in the impl<T> Any for T blanket impl? This would make the vtable trait impl and the blanket impl no longer overlap. It would also allow adopting the fix of making blanket impls suppress conflicting vtable impls, as Any's behvior seems to be the main impediment that.

One wrinkle is that downstream impls of Any for custom DSTs should probably be forbidden, but Rust doesn't (yet) have a mechanism for that.

@Jules-Bertholet
Copy link
Contributor

Making dyn Trait not dyn safe if:

  • there is a dyn overlapping impl and
    • the trait has associated non-functions
    • that do not have where Self: Sized (not implemented)

This solution is unnecessarily restrictive. The problem is not the dyn-safety of the trait, the problem is the conflict between the blanket and compiler-generated trait impls. Suppressing the latter is sufficient to resolve the incoherence. For example, the trait below could be fully object safe:

trait Foo {
    type Assoc;
}

impl<T: ?Sized> Foo for T {
     type Assoc = i32;
}

@Jules-Bertholet
Copy link
Contributor

(HRTBs complicate the story somewhat. Given trait Foo<'a> {}, an impl<T: ?Sized> Foo<'static> for T should suppress the compiler-generated impl<'a> Foo<'a> for dyn for<'any> Foo<'any> {}.)

@compiler-errors
Copy link
Member

Given these facts, might it be justifiable, as a soundness fix, to add a Sized bound to T in the impl Any for T blanket impl?

#115006

I tried that, and it's not really great.

@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Nov 19, 2023

Hmm, looking at those regressions, perhaps edition-dependent name resolution offers us a way out. Potential plan:

  1. Split the Any trait into two traits, edition2024#Any and edition2021#Any.
  2. On edition >= 2024, Any always resolves to edition2024#Any.
  3. On edition <= 2021, Any resolves to edition2021#Any in most cases. But when used as part of a trait object (dyn Any) or supertrait bound (trait Foo: Any), it resolves to edition2024#Any. (The latter condition ensures that upcasts to dyn Any work as expected.)
  4. edition2021#Any has a blanket impl for all T: ?Sized (that suppresses the vtable impl). edition2024#Any is implemented for all T: ?Sized + Aligned (so as not to break protobuf<=0.2, which has a trait with an Any supertrait and an impl for str; RFC: Aligned trait rfcs#3319 is therefore a prerequisite).

I think that would fix the vast majority of the crater regressions.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 24, 2023
Document behavior of `<dyn Any as Any>::type_id()`

See also rust-lang#57893

`@rustbot` label A-docs T-libs
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 24, 2023
Rollup merge of rust-lang#118028 - Jules-Bertholet:dyn-any-doc, r=thomcc

Document behavior of `<dyn Any as Any>::type_id()`

See also rust-lang#57893

`@rustbot` label A-docs T-libs
meng-xu-cs pushed a commit to meng-xu-cs/rust that referenced this issue Nov 25, 2023
@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Jan 10, 2024

Another possible solution:

  • make blanket impls always suppress overlapping vtable impls, and
  • make inherent methods on dyn Trait shadow trait methods of the same name, and
  • add an inherent type_id() method to dyn Any that uses the vtable impl, and
  • make method call resolution on trait objects prefer inherent impls from upcasting over trait impls.

@matthiaskrgr matthiaskrgr added the I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ label Apr 5, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 24, 2024
…pl, r=<try>

Reject blanket object impls that are possibly incoherent wrt associated types

I would like to make this test more sophisticated.

Namely, we should plug the unconstrained associated types of the object type with placeholders, and then detect cases where the placeholders *don't* end up being what the blanket impl would have predicted. In that case, we know that we can use a `dyn Trait` to abuse the unsoundness in rust-lang#57893.

However, first I'd like to see what the most naïve fallout of this is.

r? `@ghost`
@lcnr
Copy link
Contributor

lcnr commented Jul 24, 2024

@saethlin
Copy link
Member

saethlin commented Aug 6, 2024

(already linked above via back-link) FYI the reproducer for this miscompile ICEs Miri because said reproducer currently compiles to invalid MIR: #127667

@lcnr
Copy link
Contributor

lcnr commented Oct 29, 2024

this pattern also affects the impl of ToString for types which are display:

trait ToString {
    type Assoc;
}
trait Display {}
impl<T: ?Sized + Display> ToString for T {
    type Assoc = usize;
}
trait Super: ToString + Display {}
fn via_display<T: ?Sized + Display>() -> <T as ToString>::Assoc {
    0
}

fn dyn_trait() -> u32 {
    *via_display::<dyn Super<Assoc = Box<u32>>>()
}

fn main() {
    println!("{}", dyn_trait());
}

however, we only get unsoundness if the affected trait has an associated type.

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 21, 2024
…eteness, r=<try>

Disqualify built-in trait impl if it seems likely to overlap in an unsound way with a blanket impl

cc rust-lang#57893

r? lcnr
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 21, 2024
…eteness, r=<try>

Disqualify built-in trait impl if it seems likely to overlap in an unsound way with a blanket impl

cc rust-lang#57893

r? lcnr
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 4, 2024
rework winnowing to sensibly handle global where-bounds

There may be multiple ways to prove a given trait-bound. In case there are multiple such applicable candidates we need to somehow merge them or bail with ambiguity. When merging candidates we prefer some over others for multiple reasons:

- we want to guide inference during typeck, even if not strictly necessary
- to avoid ambiguity if there if there are at most lifetime differences
    - old solver needs exactly one candidate
    - new solver only needs to handle lifetime differences
- we disable normalization via impls if the goal is proven by using a where-bound

## The approach in this PR[^1]

- always prefer trivial builtin impls[^6]
- then prefer non-global[^global] where-bounds
    - if there exists exactly one where-bound, guide inference
    - if there are multiple where-bounds even if some of them are global, ambig
- then prefer alias bounds[^2] and builtin trait object candidates[^3][^2]
- merge everything ignoring global where-bounds
- if there are no other candidates, try using global where-bounds[^5]

**We disable normalization via impls when using non-global where-bounds or alias-bounds, even if we're unable to normalize by using the where-bound.**

[^1]: see the source for more details
[^2]: [we arbitrary select a single object and alias-bound candidate in case multiple apply and they don't impact inference](https://github.com/rust-lang/rust/blob/a4cedecc9ec76b46dcbb954750068c832cf2dd43/compiler/rustc_trait_selection/src/traits/select/mod.rs#L1906-L1911). This should be unnecessary in the new solver.
[^3]: Necessary for `dyn Any` and rust-lang#57893
[^global]: a where-bound is global if it is not higher-ranked and doesn't contain any generic parameters, `'static` is ok
[^5]: global where-bounds are only used if they are unsatisfiable, i.e. no impl candidate exists
[^6]: they don't constrain inference and don't add any lifetime constraints

## Why this behavior?

### inference guidance via where-bounds and alias-bounds

#### where-bounds

```rust
fn method_selection<T: Into<u64>>(x: T) -> Option<u32> {
    x.into().try_into().ok()
    // prove `T: Into<?0>` and then select a method `?0`,
    // needs eager inference.
}
```

While the above pattern exists in the wild, I think that most inference guidance due to where-bounds is actually unintended. I believe we may want to restrict inference guidance in the future, e.g. limit it to where-bounds whose self-type is a param.

#### alias-bounds

```rust
pub trait Dyn {
    type Word: Into<u64>;
    fn d_tag(&self) -> Self::Word;
    fn tag32(&self) -> Option<u32> {
        self.d_tag().into().try_into().ok()
        // prove `Self::Word: Into<?0>` and then select a method
        // on `?0`, needs eager inference.
    }
}
```

### Disable normalization via impls when using where-bounds

cc rust-lang/trait-system-refactor-initiative#125

```rust
trait Trait<'a> {
    type Assoc;
}

impl<T> Trait<'static> for T {
    type Assoc = ();
}

// normalizing requires `'a == 'static`, the trait bound does not.
fn foo<'a, T: Trait<'a>>(_: T::Assoc) {}
```

If an impl adds constraints not required by a where-bound, using the impl may cause compilation failure avoided by treating the associated type as rigid.

This is also why we can always use trivial builtin impls, even for normalization. They are guaranteed to never add any requirements.

### Lower priority for global where-bounds

A where-bound is considered global if it does not refer to any generic parameters and is not higher-ranked. It may refer to `'static`.

This means global where-bounds are either always fully implied by an impl or unsatisfiable. We don't really care about the inference behavior of unsatisfiable where-bounds :3

If a where-bound is fully implied then using an applicable impl for normalization cannot result in additional constraints. As this is the - afaict only - reason why we disable normalization via impls in the first place, we don't have to disable normalization via impls when encountering global where-bounds.

### Consider global where-bounds at all

Given that we just use impls even if there exists a global where-bounds, you may ask why we don't just ignore these global where-bounds entirely: we use them to weaken the inference guidance from non-global where-bounds.

Without a global where-bound, we currently prefer non-global where bounds even though there would be an applicable impl as well. By adding a non-global where-bound, this *unnecessary* inference guidance is disabled, allowing the following to compile:
```rust
fn check<Color>(color: Color)
where
    Vec: Into<Color> + Into<f32>,
{
    let _: f32 = Vec.into();
    // Without the global `Vec: Into<f32>`  bound we'd
    // eagerly use the non-global `Vec: Into<Color>` bound
    // here, causing this to fail.
}

struct Vec;
impl From<Vec> for f32 {
    fn from(_: Vec) -> Self {
        loop {}
    }
}
```
[There exist multiple crates which rely on this behavior](rust-lang#124592 (comment)).

## Design considerations

We would like to be able to normalize via impls as much as possible. Disabling normalization simply because there exists a where-bound is undesirable.

For the sake of backwards compatability I intend to mostly mirror the current inference guidance rules and then explore possible improvements once the new solver is done. I do believe that removing unnecessary inference guidance where possible is desirable however.

Whether a where-bound is global depends on whether used lifetimes are `'static`. The where-bound `u32: Trait<'static>` is either entirely implied by an impl, meaning that it does not have to disable normalization via impls, **while `u32: Trait<'a>` needs to disable normalization via impls as the impl may only hold for `'static`**. Considering all where-bounds to be non-global once they contain any region is unfortunately a breaking change.

## How does this differ from stable

The currently stable approach is order dependent:
- it prefers impls over global where-bounds: impl > global
- it prefers non-global where-bounds over impls: non-global > impl
- it treats all where-bounds equally: global = non-global

This means that whether we bail with ambiguity or simply use the non-global where bound depending on the *order of where-clauses* and *number of applicable impl candidates*. See the tests added in the first commit for more details. With this PR we now always bail with ambiguity.

I've previously tried to always use the non-global candidate, causing unnecessary inference guidance and undesirable breakage. This already went through an FCP in rust-lang#124592. However, I consider the new approach to be preferable as it exclusively removes incompleteness. It also doesn't cause any crater breakage.

## How to support this in the new solver :o

**This is separately implemented in rust-lang#133643 and not part of this FCP!**

To implement the global vs non-global where-bound distinction, we have to either keep `'static` in the `param_env` when canonicalizing, or eagerly distinguish global from non-global where-bounds and provide that information to the canonical query.

The old solver currently keeps `'static` only the `param_env`, replacing it with an inference variable in the `value`.
https://github.com/rust-lang/rust/blob/a4cedecc9ec76b46dcbb954750068c832cf2dd43/compiler/rustc_infer/src/infer/canonical/canonicalizer.rs#L49-L64

I dislike that based on *vibes* and it may end up being a problem once we extend the environment inside of the solver as [we must not rely on `'static` in the `predicate` as it would get erased in MIR typeck](rust-lang/trait-system-refactor-initiative#30).

An alternative would be to eagerly detect trivial where-bounds when constructing the `ParamEnv`. We can't entirely drop them [as explained above](https://hackmd.io/qoesqyzVTe2v9cOgFXd2SQ#Consider-true-global-where-bounds-at-all), so we'd instead replace them with a new clause kind `TraitImpliedByImpl` which gets entirely ignored except when checking whether we should eagerly guide inference via a where-bound. This approach can be extended to where-bounds which are currently not considered global to stop disabling normalization for them as well.

Keeping `'static` in the `param_env` is the simpler solution here and we should be able to move to the second approach without any breakage. I therefore propose to keep `'static` in the environment for now.

---

r? `@compiler-errors`
@lcnr lcnr moved this from unknown to unblocked in T-types unsound issues Dec 5, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 17, 2024
rework winnowing to sensibly handle global where-bounds

There may be multiple ways to prove a given trait-bound. In case there are multiple such applicable candidates we need to somehow merge them or bail with ambiguity. When merging candidates we prefer some over others for multiple reasons:

- we want to guide inference during typeck, even if not strictly necessary
- to avoid ambiguity if there if there are at most lifetime differences
    - old solver needs exactly one candidate
    - new solver only needs to handle lifetime differences
- we disable normalization via impls if the goal is proven by using a where-bound

## The approach in this PR[^1]

- always prefer trivial builtin impls[^6]
- then prefer non-global[^global] where-bounds
    - if there exists exactly one where-bound, guide inference
    - if there are multiple where-bounds even if some of them are global, ambig
- then prefer alias bounds[^2] and builtin trait object candidates[^3][^2]
- merge everything ignoring global where-bounds
- if there are no other candidates, try using global where-bounds[^5]

**We disable normalization via impls when using non-global where-bounds or alias-bounds, even if we're unable to normalize by using the where-bound.**

[^1]: see the source for more details
[^2]: [we arbitrary select a single object and alias-bound candidate in case multiple apply and they don't impact inference](https://github.com/rust-lang/rust/blob/a4cedecc9ec76b46dcbb954750068c832cf2dd43/compiler/rustc_trait_selection/src/traits/select/mod.rs#L1906-L1911). This should be unnecessary in the new solver.
[^3]: Necessary for `dyn Any` and rust-lang#57893
[^global]: a where-bound is global if it is not higher-ranked and doesn't contain any generic parameters, `'static` is ok
[^5]: global where-bounds are only used if they are unsatisfiable, i.e. no impl candidate exists
[^6]: they don't constrain inference and don't add any lifetime constraints

## Why this behavior?

### inference guidance via where-bounds and alias-bounds

#### where-bounds

```rust
fn method_selection<T: Into<u64>>(x: T) -> Option<u32> {
    x.into().try_into().ok()
    // prove `T: Into<?0>` and then select a method `?0`,
    // needs eager inference.
}
```

While the above pattern exists in the wild, I think that most inference guidance due to where-bounds is actually unintended. I believe we may want to restrict inference guidance in the future, e.g. limit it to where-bounds whose self-type is a param.

#### alias-bounds

```rust
pub trait Dyn {
    type Word: Into<u64>;
    fn d_tag(&self) -> Self::Word;
    fn tag32(&self) -> Option<u32> {
        self.d_tag().into().try_into().ok()
        // prove `Self::Word: Into<?0>` and then select a method
        // on `?0`, needs eager inference.
    }
}
```

### Disable normalization via impls when using where-bounds

cc rust-lang/trait-system-refactor-initiative#125

```rust
trait Trait<'a> {
    type Assoc;
}

impl<T> Trait<'static> for T {
    type Assoc = ();
}

// normalizing requires `'a == 'static`, the trait bound does not.
fn foo<'a, T: Trait<'a>>(_: T::Assoc) {}
```

If an impl adds constraints not required by a where-bound, using the impl may cause compilation failure avoided by treating the associated type as rigid.

This is also why we can always use trivial builtin impls, even for normalization. They are guaranteed to never add any requirements.

### Lower priority for global where-bounds

A where-bound is considered global if it does not refer to any generic parameters and is not higher-ranked. It may refer to `'static`.

This means global where-bounds are either always fully implied by an impl or unsatisfiable. We don't really care about the inference behavior of unsatisfiable where-bounds :3

If a where-bound is fully implied then using an applicable impl for normalization cannot result in additional constraints. As this is the - afaict only - reason why we disable normalization via impls in the first place, we don't have to disable normalization via impls when encountering global where-bounds.

### Consider global where-bounds at all

Given that we just use impls even if there exists a global where-bounds, you may ask why we don't just ignore these global where-bounds entirely: we use them to weaken the inference guidance from non-global where-bounds.

Without a global where-bound, we currently prefer non-global where bounds even though there would be an applicable impl as well. By adding a non-global where-bound, this *unnecessary* inference guidance is disabled, allowing the following to compile:
```rust
fn check<Color>(color: Color)
where
    Vec: Into<Color> + Into<f32>,
{
    let _: f32 = Vec.into();
    // Without the global `Vec: Into<f32>`  bound we'd
    // eagerly use the non-global `Vec: Into<Color>` bound
    // here, causing this to fail.
}

struct Vec;
impl From<Vec> for f32 {
    fn from(_: Vec) -> Self {
        loop {}
    }
}
```
[There exist multiple crates which rely on this behavior](rust-lang#124592 (comment)).

## Design considerations

We would like to be able to normalize via impls as much as possible. Disabling normalization simply because there exists a where-bound is undesirable.

For the sake of backwards compatability I intend to mostly mirror the current inference guidance rules and then explore possible improvements once the new solver is done. I do believe that removing unnecessary inference guidance where possible is desirable however.

Whether a where-bound is global depends on whether used lifetimes are `'static`. The where-bound `u32: Trait<'static>` is either entirely implied by an impl, meaning that it does not have to disable normalization via impls, **while `u32: Trait<'a>` needs to disable normalization via impls as the impl may only hold for `'static`**. Considering all where-bounds to be non-global once they contain any region is unfortunately a breaking change.

## How does this differ from stable

The currently stable approach is order dependent:
- it prefers impls over global where-bounds: impl > global
- it prefers non-global where-bounds over impls: non-global > impl
- it treats all where-bounds equally: global = non-global

This means that whether we bail with ambiguity or simply use the non-global where bound depending on the *order of where-clauses* and *number of applicable impl candidates*. See the tests added in the first commit for more details. With this PR we now always bail with ambiguity.

I've previously tried to always use the non-global candidate, causing unnecessary inference guidance and undesirable breakage. This already went through an FCP in rust-lang#124592. However, I consider the new approach to be preferable as it exclusively removes incompleteness. It also doesn't cause any crater breakage.

## How to support this in the new solver :o

**This is separately implemented in rust-lang#133643 and not part of this FCP!**

To implement the global vs non-global where-bound distinction, we have to either keep `'static` in the `param_env` when canonicalizing, or eagerly distinguish global from non-global where-bounds and provide that information to the canonical query.

The old solver currently keeps `'static` only the `param_env`, replacing it with an inference variable in the `value`.
https://github.com/rust-lang/rust/blob/a4cedecc9ec76b46dcbb954750068c832cf2dd43/compiler/rustc_infer/src/infer/canonical/canonicalizer.rs#L49-L64

I dislike that based on *vibes* and it may end up being a problem once we extend the environment inside of the solver as [we must not rely on `'static` in the `predicate` as it would get erased in MIR typeck](rust-lang/trait-system-refactor-initiative#30).

An alternative would be to eagerly detect trivial where-bounds when constructing the `ParamEnv`. We can't entirely drop them [as explained above](https://hackmd.io/qoesqyzVTe2v9cOgFXd2SQ#Consider-true-global-where-bounds-at-all), so we'd instead replace them with a new clause kind `TraitImpliedByImpl` which gets entirely ignored except when checking whether we should eagerly guide inference via a where-bound. This approach can be extended to where-bounds which are currently not considered global to stop disabling normalization for them as well.

Keeping `'static` in the `param_env` is the simpler solution here and we should be able to move to the second approach without any breakage. I therefore propose to keep `'static` in the environment for now.

---

r? `@compiler-errors`
@fmease fmease added A-trait-objects Area: trait objects, vtable layout and removed A-DSTs Area: Dynamically-sized types (DSTs) labels Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trait-objects Area: trait objects, vtable layout A-trait-system Area: Trait system C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority S-bug-has-test Status: This bug is tracked inside the repo by a `known-bug` test. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-types Relevant to the types team, which will review and decide on the PR/issue.
Projects
Status: unblocked
Development

No branches or pull requests