-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constants that depend on type parameters in generic code #1062
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,291 @@ | ||
- Feature Name: generic_dependent_consts | ||
- Start Date: 2015-03-28 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
Allow consts declared in functions to depend on function type parameters. Since | ||
this raises some issues in match and type checks, use an conservative approach | ||
that allows only minimal use of such consts when a constant expression is | ||
required. | ||
|
||
# Motivation | ||
|
||
Consider the following line of code in a non-generic context: | ||
|
||
```rust | ||
let a: [u8; T::N + T::N] = [0u8; 2*T::N]; | ||
``` | ||
|
||
In the current associated consts implementation (modulo some temporary wrinkles | ||
involving constants associated to traits), this will readily type-check. Since | ||
the type `T` must be known in non-generic code, `T::N` will evaluate to some | ||
value, say `16`, in which case it is clear that both sides of the assignment | ||
involve arrays of size `32`. | ||
|
||
However, in generic code, if `T` is a type parameter, we are asking the type | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I object to this notion. I don't think we're asking the compiler to prove some generic equation. We're asking it to evaluate a kind of a poor man's static_assert at the point of template instantiation. To wit, given: trait Foo {
const VALUE: usize;
} ...then: fn bar<T: Foo>() {
let _: [u8; T::VALUE] = [0; 111];
} ...is effectively the same as (using C++ syntax for static_assert): fn bar<T: Foo>() {
static_assert(T::VALUE == 111, "Invalid value");
} Or, in more general: let _: [u8; x] = [0; y]; ...is effectively the same as: static_assert(x == y, ""); ...and if either expression There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now, I believe that, at least in theory, if a generic function is valid on its own, and it is instantiated with types that match its signature, instantiation should never fail. Or to put it differently, Rust generics are not like C++ templates in that they are fully type-checked before monomorphization. We use trait bounds and where clauses, which explicitly spell out requirements in a function's signature, instead of static assertions in function bodies. This means that we need one of three things to be the case:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I don't really understand what the difference is between your case 2 (being able to specify template argument constraints on associated constants) and having a C++ style static_assert inside the function body (other than that the former is nicer because then you see the function requirements just by looking at the function signature). In case Rust gets function specialization in the future, then those aforementioned two things would be fundamentally different. Assuming your case 2 has been implemented using the obvious syntax, the following would be effectively the same as doing the equivalent C++ style static assertion inside the function body: fn bar<A, B>() where
A: Foo,
B: Foo,
A::VALUE == B::VALUE
{}
trait Foo { const VALUE: usize; } ...in that we are deferring the evaluation of the assertion that the two fn bar<A>() where
A: Foo,
A::VALUE + A::VALUE == 2 * A::VALUE
{} Evaluating all those kinds of equations in constraints can be deferred to the point of template instantiation. I don't see any use case for allowing code like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Three differences I can think of:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I only now realized that your argument was that given that template instantiations are not allowed to fail if their constraints are satisfied, then using something like I think some kind of a mutual-agreement-based pruning of irrelevant discussion branches would be a nice feature here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
IIRC, what I was suggesting was essentially some sort of reflection based type-level computation. I've been meaning to write something more concrete about this but until then here is some intuition about how it should work. Take a piece of code like this from my length-indexed containers example: pub fn insert<M: Nat, SN: Nat>(mut self, index: M, element: T) -> LVec<T, SN> where
Compare: Fn(M, N) -> order::LT,
Succ: Fn(N) -> SN,
{
self.data.insert(index.reflect(), element);
unsafe { self.reindex() }
} Now, this is an old example that used plain binary natural numbers, but it is efficient enough to use for vectors with at least thousands of elements, probably more. Scaling up to much larger numbers is possible with a different encoding. However, it would still be nice to have near-native performance for computation when type-checking these things. The way to do that, I think, is to just provide some additional information to the Rust compiler. If we forget about the fact that singleton values are "more typed" than the usual runtime values, we can construct a bijection: // Positive binary naturals in bijective base-2 style
struct One;
struct Mul2 <N: Nat>(N);
struct Mul2s<N: Nat>(N);
// type is the builtin type to use, trait is the trait classifying type data
#[reify(type = u64, trait = Nat)]
trait Nat: Inductive {}
// cons is the expression interpreting the type data constructor
#[reify(type = u64, cons = 1)]
impl Nat for One {}
#[reify(type = Fn(u64) -> u64, cons = |n| { n * 2 })]
impl<N: Nat> Nat for Mul2<N> {}
#[reify(type = Fn(u64) -> u64, cons = |n| { n * 2 + 1 })]
impl<N: Nat> Nat for Mul2S<N> {}
struct Div2;
struct Div2p;
struct Succ;
struct Pred;
struct Add;
struct Mul;
…
#[reify(type = Fn(u64) -> u64, fun = |n| { n / 2 })]
impl<N: Nat> Fn<(Mul2<N>,)> for Div2 {
type Output = N;
extern “rust-call” fn call(&self, (m2n,): (Mul2<N>,)) -> N {
m2n.0
}
}
#[reify(type = Fn(u64) -> u64, fun = |n| { (n - 1) / 2 })]
impl<N: Nat> Fn<(Mul2s<N>,)> for Div2p {
type Output = N;
extern “rust-call” fn call(&self, (m2sn,): (Mul2s<N>,)) -> N {
m2sn.0
}
}
// arguments are rules name, domain Rust type, codomain Rust trait classifying type data
reflect_rules!(u64_to_nat, u64, Nat) {
(0) => { impossible!() }
(1) => { One }
(n) if $n & 1 == 0 => { Mul2<u64_to_nat!($n / 2)> }
(n) => { Mul2s<u64_to_nat!(($n - 1) / 2)> }
}
… The idea is to provide additional information with type-level "constructors" that says how to interpret them in terms of a normal Rust type. The inverse is provided through a mapping like So back to the original example, when Rust starts to check something like The main issue with this approach is that Rust isn't expressive enough to be able to also encode that these rules mapping back and forth are correct. This is up to the programmer to prove. Essentially the rules need to form a well behaved equational theory, so you want things like this (call
I think there are a number of advantages to using an approach like this. For one, it degrades nicely and (assuming correctness) doesn't change the semantics, it's just providing a potential speedup. It facilitates a bridge between runtime computation and static computation that doesn't require an entirely new evaluation mechanism and extensions to the checker since it would re-use existing facilities for this via the plugin mechanism. It would require the ability to load plugins during the type-checking phase, but from what I could tell this shouldn't cause any fundamental problems. And since the structure of this kind of code is very predictable, it would be possible to automatically derive a lot of it, which could also be used for generating quickcheck tests, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't worry about it. I know that it these discussions cause some clutter, and I hope that that doesn't discourage people from reading, but at the same time I had almost exactly the same reaction as you only a few months ago, when I was still thinking of Rust generics as being like C++ templates (even though I thought I had absorbed enough from Haskell, maybe ML, to know better). So my hope is that people reading are either interested enough to get through it, ADHD enough to skip to the next part, or green enough that it's educational for them. Thanks for your explanation (as you may have guessed, I mentioned you in the hope of getting you involved here!). I think that in a very broad sense, I had the gist of what you intended, but I'll have to consider the implementation details you mentioned, which I either didn't see before or simply forgot. I'll have to consider this when I have the time to do it justice. In the meantime, I'd appreciate your thoughts on this RFC in general, and particularly if it's compatible with the route you'd take, whether or not it lines up conceptually. (Preferably on the main thread rather than this line note, which is getting a bit excessive; admittedly that's my fault.) |
||
checker to apply the general proposition that `N + N = 2 * N` for all `usize` | ||
values (that do not overflow when doubled). While this case is simple, requiring | ||
the compiler to prove arbitrary identities is untenable. | ||
|
||
A similar story applies to match patterns. In order to perform exhaustiveness | ||
and reachability checks, the compiler must be able to perform certain | ||
comparisons on match arms. However, this analysis is defeated if the compiler | ||
does not know what the value in a match pattern actually is. | ||
|
||
To date we have dodged the issue by simply not allowing constant expressions to | ||
depend upon type parameters at all. | ||
|
||
However, in the process we have foreclosed a fairly wide range of possibilities. | ||
Here are several examples of functions that are forbidden, even with basic | ||
support for associated consts: | ||
|
||
```rust | ||
fn do_something_optional<T>(x: Option<T>) { | ||
const NOTHING: Option<T> = None; | ||
match x { | ||
Some(y) => { /* ... */ } | ||
NOTHING => { /* ... */ } | ||
} | ||
} | ||
unsafe fn do_something_with_bitmask<T>(x: T) { | ||
// Assume for the sake of argument that Sized::SIZE is an associated const | ||
// of type `usize`. | ||
const SIZE_IN_BITS: usize = 8 * <T as Sized>::SIZE; | ||
/* ... */ | ||
} | ||
fn circle_area<T: Float>(radius: T) { | ||
// This redefinition is not very useful, but has a straightforward meaning. | ||
const PI: T = T::PI; | ||
PI*radius*radius | ||
} | ||
fn circumference<T: Float>(radius: T) { | ||
const TWO_PI: T = 2.0 * T::PI; | ||
TWO_PI*radius | ||
} | ||
``` | ||
|
||
The first three examples would be allowed under this RFC. The fourth function, | ||
`circumference`, would still be disallowed because the compiler cannot verify | ||
during type checking that `2.0 * T::PI` is a constant expression of type | ||
`T`. However, this RFC removes *one* of the barriers that prevents | ||
implementation of this last example. | ||
|
||
Note that the question of how to handle generic constants will become even more | ||
pressing if user-defined types are allowed to depend on constants in the future. | ||
In that case, these issues will not be limited to the interaction between | ||
associated consts and a handful of language features. | ||
|
||
# Detailed design | ||
|
||
Because they are inlined wherever possible, non-associated constants are similar | ||
to macros in many circumstances. Consider this definition: | ||
|
||
```rust | ||
const C: T = EXPR; | ||
``` | ||
|
||
Most uses of `C` could be replaced with `EXPR: T` (using type ascription) with | ||
no change in the meaning of the code. The main difference is that when a named | ||
constant `C` is defined, `EXPR` is required to be a constant expression, and can | ||
be used in certain situations (especially patterns) where inlining the | ||
expression, either manually or via macro, would produce invalid syntax. (Of | ||
course, named constants also improve readability and allow for clearer error | ||
messages in some cases.) | ||
|
||
Keeping this in mind, it is straightforward to understand most uses of constant | ||
expressions. The design below will mostly focus on special cases and | ||
interactions with other language features. | ||
|
||
## Other items in functions | ||
|
||
The scope of this RFC is limited to consts. Nested functions will still be | ||
forbidden from referencing "outer" type parameters. The same is true for | ||
statics. There are two justifications for this. | ||
|
||
Firstly, nested static items, whether they are static variables or functions, | ||
must have static locations in memory. If these items could use the type | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not expect that. Tried it out in the playpen, and you're completely correct. I would have expected every parameter-combination to have its own static, but that would be impossible to get correct, since multiple crates might be using a crate with a generic function and then you'd get multiple statics for the same parameter-combination. |
||
parameters from the surrounding scope, they would have to be newly instantiated | ||
for each instantiation of the enclosing function. | ||
|
||
Secondly, functions that make use of outer type parameters are in effect higher- | ||
kinded types. These seem to require more careful consideration to implement. | ||
|
||
Neither of these considerations apply to constants, which the compiler is always | ||
allowed (and often required) to inline. They do not need a static location in | ||
memory, and unlike functions, they cannot add new type or lifetime parameters to | ||
those already present in the surrounding scope. | ||
|
||
## `'static` references | ||
|
||
Consider the following definitions of similar functions: | ||
|
||
```rust | ||
// Currently not valid (borrowed value does not live long enough). | ||
fn ref_one_literal() -> &'static u32 { | ||
&1 | ||
} | ||
// Currently valid (a static is implicitly created). | ||
fn ref_one_const() -> &'static u32 { | ||
const REF_ONE: &'static u32 = &1; | ||
REF_ONE | ||
} | ||
// Not valid (borrowed value does not live long enough). | ||
fn ref_one_ref_const() -> &'static u32 { | ||
const ONE: u32 = 1; | ||
&ONE | ||
} | ||
// Should either of the following become valid? | ||
fn ref_one_generic_const<T: Int>() -> &'static T { | ||
const REF_ONE: &'static u32 = &T::ONE; | ||
REF_ONE | ||
} | ||
fn ref_one_ref_associated_const<T: Int>() -> &'static T { | ||
&T::ONE | ||
} | ||
``` | ||
|
||
This RFC proposes that both `ref_one_generic_const` and | ||
`ref_one_ref_associated_const` would be invalid. | ||
|
||
The associated const case is disallowed for the same reason as | ||
`ref_one_ref_const`, i.e. because the expression is not `'static` and thus does | ||
not live long enough for a static borrow. | ||
|
||
The generic const case is disallowed because it implicitly creates a static item | ||
that depends on a type parameter, and this is forbidden, as mentioned in the | ||
preceding section. | ||
|
||
In order to implement this restriction, use of a type parameter in a constant | ||
expression must be considered "contagious". That is, if the initializer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think contagious is the correct word. Maybe contiguous? I'm not really sure what word you're looking for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. contagious is the correct word. People sometimes use "viral" to have the same meaning, but with stronger negative connotations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh. I thought it had to do with sickness. I even looked it up to check. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A condition that is "contagious" is spread between things that contact one another. It often refers to disease, but not always. Maybe I could use a different word, like "inherited". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it matters anymore because it's been explained to me. Only if the misunderstanding is common should be it changed. I don't have any evidence that it is. |
||
expression for a const uses a type parameter, then all expressions that | ||
reference that const implicitly depend on the same type parameter. | ||
|
||
## Match patterns | ||
|
||
For the moment, constant values used in match patterns are subject to the same | ||
restrictions as statics, i.e. they cannot depend on type parameters. Without | ||
this restriction, it would not be possible to guarantee that match expressions | ||
satisfy the exhaustiveness and reachability criteria for all possible | ||
instantiations of a generic function. This restriction may be loosened | ||
backwards-compatibly in the future by adding syntax to constrain associated | ||
consts in generic code (similar to how type parameters, and their associated | ||
types, can already be constrained with `where` clauses). | ||
|
||
However, note that when the *type* of a constant depends on a generic parameter, | ||
whereas its *value* does not, the constant is still allowed in a match pattern. | ||
This can occur, for instance, when the value is a nullary enum (see the example | ||
in the `Motivation` section). | ||
|
||
## Array sizes | ||
|
||
In order for type checking to determine whether or not two array types are | ||
equal, it must be able to compare the arrays' sizes. This presents a special | ||
problem when performing arithmetic on constants that depend on type parameters, | ||
as outlined in the `Motivation` section above. | ||
|
||
To avoid having to prove arbitrary mathematical identities, all constant | ||
expressions that affect array sizes are divided into the following three | ||
categories: | ||
|
||
1. Constant expressions that do not depend on type parameters at all. These | ||
will continue to behave as they do now; they are evaluated during type | ||
checking, and will be considered equal to all other expressions that can be | ||
evaluated during type checking to the same value. | ||
|
||
2. Constant expressions that consist of only a single path (or an identifier), | ||
where that path resolves to a constant that depends on at least one type | ||
parameter. During type checking, such expressions will compare equal to any | ||
expression that consists of only a single path that resolves to the same | ||
item. This reduces an impossible problem (determining whether two arbitrary | ||
expressions are equivalent) to a simple one (determining whether two paths | ||
resolve to the same item). | ||
|
||
3. Constant expressions of any other form that depend on type parameters. These | ||
expressions will never be considered equivalent to any other expression. | ||
|
||
To further explain case 2, the following will be allowed: | ||
|
||
```rust | ||
let a: [u8; <T>::N] = [0u8; <T>::N]; | ||
const X: usize = 2*<U as Trait>::M; | ||
let a: [u8; X] = [0u8; X]; | ||
// This is not allowed: | ||
// let a: [u8; X] = [0u8; 2*<U as Trait>::M]; | ||
``` | ||
|
||
Case 3 rules out many uses of arrays with sizes that depend on type parameters. | ||
However, there are some operations where the array size is irrelevant to whether | ||
or not the code type-checks, such as coercing a reference to an array to a fat | ||
pointer to a slice, or creating a raw pointer to a fixed-size buffer in order to | ||
hand the pointer to external code via FFI. In such cases, an array expression | ||
such as `[0u8; 2*<U as Trait>::M]` could still be useful. | ||
|
||
The justification for the above rules is that it seems premature to settle on a | ||
specific strategy for dealing with constant expressions in types right now. | ||
However, the rule in case 2, which states that two paths that resolve to the | ||
same item will compare equal in type checking, seems to describe a bare minimum | ||
of functionality that will almost certainly be a part of any further long-term | ||
solutions. | ||
|
||
# Drawbacks | ||
|
||
These issues will probably be tackled eventually, since generic code is where | ||
associated consts, and perhaps in the future "real" dependent types, will be | ||
most useful. However, we could postpone any decisions until further extensions | ||
of the type system force the issue. This proposal does introduce some | ||
complexity, since generic constants must be treated in many situations like | ||
generic types. | ||
|
||
Since this design is somewhat conservative regarding code that will be accepted, | ||
it may also produce some confusion when code that seems "obviously" OK is | ||
rejected by the compiler. For instance, this is rejected: | ||
|
||
```rust | ||
// `T` is a generic parameter. | ||
const X: usize = <T>::N; | ||
// We don't backtrack to see that the RHS is using the same expression as the | ||
// one that defines X, so this line is invalid. | ||
let a: [u8; X] = [0u8; <T>::N]; | ||
``` | ||
|
||
# Alternatives | ||
|
||
## Status quo | ||
|
||
We could keep the status quo, where type parameters cannot influence the values | ||
in constant expressions at all. This would somewhat reduce the utility of | ||
associated consts, and prevent us from giving this solution a "trial run", but | ||
the language would be simpler for now. | ||
|
||
## Forbid generic constants from appearing in constant expressions | ||
|
||
We could allow constant values and their types to depend on type parameters, but | ||
not consider them to be constants in match patterns or array sizes at all. Aside | ||
from being inlinable, the user would not be able to expect to compiler to do | ||
anything with these "constants" beyond what it can already do with non-constant | ||
variables. | ||
|
||
## Allow constants that are "aliases" of other constants to be proven equal | ||
|
||
We could implement this RFC and additionally allow the special example in the | ||
`Drawbacks` section as valid code. This seems unnecessary if a `const` | ||
declaration is viewed as purely creating its own item, but it seems that this | ||
code should be accepted if a `const` declaration is viewed as instead creating | ||
something more like an alias or macro, simply expanding to some inlined constant | ||
expression when used. | ||
|
||
# Unresolved questions | ||
|
||
Does it make sense to distinguish between between the *type* of a constant being | ||
generic and its *value* being generic? In its face this seems somewhat | ||
nonsensical, but in practice it seems straightforward. | ||
|
||
This design omits some possible extensions, such as allowing other forms of | ||
expressions to be considered equal during type checking. | ||
|
||
CTFE, constraints on associated constant values, and most other constant-related | ||
features are likewise ignored. The interaction with CTFE might be obvious, if we | ||
can count on the heuristic that constant functions should behave as if their | ||
code was inlined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting, that with generic
T
T::N
should have a constraintwhere T::N <= usize::MAX / 2
, otherwise it will fail in trans since overflow is an error, making the situation closer to "giving up and making generics more like C++ templates". It looks like even most basic operations on (non-C++ style) type level numbers would be impossible without support for such constraints.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, if it's part of the signature, it will be detected before instantiating the function's body with the given type parameters (i.e. "monomorphization").
If it's not part of the signature, then the operation would generate a runtime panic if it were operating on runtime values.
The way I see it, such compiler errors are "early warnings" for runtime errors - you can still get a program that type-checks if you define the result of all operations which could fail.
It is actually possible to codegen such a function to panic at runtime instead of emitting a compile-time error.
It could also trigger a lint, so you can get an error for functions which unconditionally panic, if you want that (could be an error by default, too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand. If I write something like
shouldn't the type
[u8; 2 * T::N]
be "evaluated" during instantiation off
leading to an error? And there's no runtime here, becausef
can't even be instantiated withT::N > usize::MAX / 2
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluating
[0u8; 2 * T::N]
might fail due to overflow at compile-time and the compiler could instead produce an unconditional overflow panic.All I'm saying is that it's possible - whether or not it's also crazy, that's another matter entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see,
f
can be instantiated if we substitute anyusize
value (like result of wrapping multiplication) instead of an error. Then runtime will exist and panic can be produced.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels like regression compared to C++, if error reporting is delayed until runtime, so early warnings would definitely be useful.