Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
nikomatsakis committed Jun 11, 2014
1 parent 69a0f68 commit 620a173
Showing 1 changed file with 386 additions and 0 deletions.
386 changes: 386 additions & 0 deletions active/0000-closures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,386 @@
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

- Convert function call `a(b, ..., z)` into an overloadable operator
via the traits `Fn<A,R>`, `FnShare<A,R>`, and `FnOnce<A,R>`, where `A`
is a tuple `(B, ..., Z)` of the types `B...Z` of the arguments
`b...z`, and `R` is the return type. The three traits differ in
their self argument (`&mut self` vs `&self` vs `self`).
- Remove the `proc` expression form and type.
- Remove the closure types (though the form lives on as syntactic
sugar, see below).
- Modify closure expressions to permit specifying by-reference vs
by-value capture and the receiver type:
- Specifying by-reference vs by-value closures:
- `ref |...| expr` indicates a closure that captures upvars from the
environment by reference. This is what closures do today and the
behavior will remain unchanged, other than requiring an explicit
keyword.
- `|...| expr` will therefore indicate a closure that captures upvars
from the environment by value. As usual, this is either a copy or
move depending on whether the type of the upvar implements `Copy`.
- Specifying receiver mode (orthogonal to capture mode above):
- `|a, b, c| expr` is equivalent to `|&mut: a, b, c| expr`
- `|&mut: ...| expr` indicates that the closure implements `Fn`
- `|&: ...| expr` indicates that the closure implements `FnShare`
- `|: a, b, c| expr` indicates that the closure implements `FnOnce`.
- Add syntactic sugar where `|T1, T2| -> R1` is translated to
a reference to one of the fn traits as follows:
- `|T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>`
- `|&mut: T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>`
- `|&: T1, ..., Tn| -> R` is translated to `FnShare<(T1, ..., Tn), R>`
- `|: T1, ..., Tn| -> R` is translated to `FnOnce<(T1, ..., Tn), R>`

One aspect of closures that this RFC does *not* describe is that we
must permit trait references to be universally quantified over regions
as closures are today. A description of this change is described below
under *Unresolved questions* and the details will come in a
forthcoming RFC.

# Motivation

Over time we have observed a very large number of possible use cases
for closures. The goal of this RFC is to create a unified closure
model that encompasses all of these use cases.

Specific goals (explained in more detail below):

1. Give control over inlining to users.
2. Support closures that bind by reference and closures that bind by value.
3. Support different means of accessing the closure environment,
corresponding to `self`, `&self`, and `&mut self` methods.

As a side benefit, though not a direct goal, the RFC reduces the
size/complexity of the language's core type system by unifying
closures and traits.

## The core idea: unifying closures and traits

The core idea of the RFC is to unify closures, procs, and
traits. There are a number of reasons to do this. First, it simplifies
the language, because closures, procs, and traits already served
similar roles and there was sometimes a lack of clarity about which
would be the appropriate choice. However, in addition, the unification
offers increased expressiveness and power, because traits are a more
generic model that gives users more control over optimization.

The basic idea is that function calls become an overridable operator.
Therefore, an expression like `a(...)` will be desugar into an
invocation of one of the following traits:

trait Fn<A,R> {
fn call(&mut self, args: A) -> R;
}

trait FnShare<A,R> {
fn call(&self, args: A) -> R;
}

trait FnOnce<A,R> {
fn call(&self, args: A) -> R;
}

Essentially, `a(b, c, d)` becomes sugar for one of the following:

Fn::call(&mut a, (b, c, d))
FnShare::call(&a, (b, c, d))
FnOnce::call(a, (b, c, d))

To integrate with this, closure expressions are then translated into a
fresh struct that implements one of those three traits. The precise
trait is currently indicated using explicit syntax but may eventually
be inferred.

This change gives user control over virtual vs static dispatch. This
works in the same way as generic types today:

fn foo(x: &mut Fn<int,int>) -> int {
x(2) // virtual dispatch
}

fn foo<F:Fn<int,int>>(x: &mut F) -> int {
x(2) // static dispatch
}

The change also permits returning closures, which is not currently
possible (the example relies on the proposed `impl` syntax from
rust-lang/rfcs#105):

fn foo(x: impl Fn<int,int>) -> impl Fn<int,int> {
|v| x(v * 2)
}

Basically, in this design there is nothing special about a closure.
Closure expressions are simply a convenient way to generate a struct
that implements a suitable `Fn` trait.

## Bind by reference vs bind by value

When creating a closure, it is now possible to specify whether the
closure should capture variables from its environment ("upvars") by
reference or by value. The distinction is indicated using the leading
keyword `ref`:

|| foo(a, b) // captures `a` and `b` by value

ref || foo(a, b) // captures `a` and `b` by reference, as today

### Reasons to bind by value

Bind by value is useful when creating closures that will escape from
the stack frame that created them, such as task bodies (`spawn(||
...)`) or combinators. It is also useful for moving values out of a
closure, though it should be possible to enable that with bind by
reference as well in the future.

### Reasons to bind by reference

Bind by reference is useful for any case where the closure is known
not to escape the creating stack frame. This frequently occurs
when using closures to encapsulate common control-flow patterns:

map.insert_or_update_with(key, value, || ...)
opt_val.unwrap_or_else(|| ...)

In such cases, the closure frequently wishes to read or modify local
variables on the enclosing stack frame. Generally speaking, then, such
closures should capture variables by-reference -- that is, they should
store a reference to the variable in the creating stack frame, rather
than copying the value out. Using a reference allows the closure to
mutate the variables in place and also avoids moving values that are
simply read temporarily.

The vast majority of closures in use today are should be "by
reference" closures. The only exceptions are those closures that wish
to "move out" from an upvar (where we commonly use the so-called
"option dance" today). In fact, even those closures could be "by
reference" closures, but we will have to extend the inference to
selectively identify those variables that must be moved and take those
"by value".

# Detailed design

## Closure expression syntax

Closure expressions will have the following form (using EBNF notation,
where `[]` denotes optional things and `{}` denotes a comma-separated
list):

CLOSURE = ['ref'] '|' [SELF] {ARG} '|' ['->' TYPE] EXPR
SELF = ':' | '&' ':' | '&' 'mut' ':'
ARG = ID [ ':' TYPE ]

The optional keyword `ref` is used to indicate whether this closure
captures *by reference* or *by value*.

Closures are always translated into a fresh struct type with one field
per upvar. In a by-value closure, the types of these fields will be
the same as the types of the corresponding upvars (modulo `&mut`
reborrows, see below). In a by-reference closure, the types of these
fields will be a suitable reference (`&`, `&mut`, etc) to the
variables being borrowed.

### By-value closures

The default form for a closure is by-value. This implies that all
upvars which are referenced are copied/moved into the closure as
appropriate. There is one special case: if the type of the value to be
moved is `&mut`, we will "reborrow" the value when it is copied into
the closure. That is, given an upvar `x` of type `&'a mut T`, the
value which is actually captured will have type `&'b mut T` where `'b
<= 'a`. This rule is consistent with our general treatment of `&mut`,
which is to aggressively reborrow wherever possible; moreover, this
rule cannot introduce additional compilation errors, it can only make
more programs successfully typecheck.

### By-reference closures

A *by-reference* closure is a convenience form in which values used in
the closure are converted into references before being captured. By
reference closures are always rewritable into by value closures if
desired, but the rewrite can often be cumbersome and annoying.

Here is a (rather artificial) example of a by-reference closure in
use:

let in_vec: Vec<int> = ...;
let mut out_vec: Vec<int> = Vec::new();
let opt_int: Option<int> = ...;

opt_int.map(ref |v| {
out_vec.push(v);
in_vec.fold(v, |a, &b| a + b)
});

This could be rewritten into a by-value closure as follows:

let in_vec: Vec<int> = ...;
let mut out_vec: Vec<int> = Vec::new();
let opt_int: Option<int> = ...;

opt_int.map({
let in_vec = &in_vec;
let out_vec = &mut in_vec;
|v| {
out_vec.push(v);
in_vec.fold(v, |a, &b| a + b)
}
})

In this case, the capture closed over two variables, `in_vec` and
`out_vec`. As you can see, the compiler automatically infers, for each
variable, how it should be borrowed and inserts the appropriate
capture.

In the body of a `ref` closure, the upvars continue to have the same
type as they did in the outer environment. For example, the type of a
reference to `in_vec` in the above example is always `Vec<int>`,
whether or not it appears as part of a `ref` closure. This is not only
convenient, it is required to make it possible to infer whether each
variable is borrowed as an `&T` or `&mut T` borrow.

Note that there are some cases where the compiler internally employs a
form of borrow that is not available in the core language,
`&uniq`. This borrow does not permit aliasing (like `&mut`) but does
not require mutability (like `&`). This is required to allow
transparent closing over of `&mut` pointers as
[described in this blog post][p].

**Evolutionary note:** It is possible to evolve by-reference
closures in the future in a backwards compatible way. The goal would
be to cause more programs to type-check by default. Two possible
extensions follow:

- Detect when values are *moved* and hence should be taken by value
rather than by reference. (This is only applicable to once
closures.)
- Detect when it is only necessary to borrow a sub-path. Imagine a
closure like `ref || use(&context.variable_map)`. Currently, this
closure will borrow `context`, even though it only *uses* the field
`variable_map`. As a result, it is sometimes necessary to rewrite
the closure to have the form `{let v = &context.variable_map; ||
use(v)}`. In the future, however, we could extend the inference so
that rather than borrowing `context` to create the closure, we would
borrow `context.variable_map` directly.

## Closure sugar in trait references

The current type for closures, `|T1, T2| -> R`, will be repurposed as
syntactic sugar for a reference to the appropriate `Fn` trait. This
shorthand be used any place that a trait reference is appropriate. The
full type will be written as one of the following:

<'a...'z> |T1...Tn|: K -> R
<'a...'z> |&mut: T1...Tn|: K -> R
<'a...'z> |&: T1...Tn|: K -> R
<'a...'z> |: T1...Tn|: K -> R

Each of which would then be translated into the following trait
references, respectively:

<'a...'z> Fn<(T1...Tn), R> + K
<'a...'z> Fn<(T1...Tn), R> + K
<'a...'z> FnShare<(T1...Tn), R> + K
<'a...'z> FnOnce<(T1...Tn), R> + K

Note that the bound lifetimes `'a...'z` are not in scope for the bound
`K`.

# Drawbacks

This model is more complex than the existing model in some respects
(but the existing model does not serve the full set of desired use cases).

# Alternatives

There is one aspect of the design that is still under active
discussion:

**Introduce a more generic sugar.** It was proposed that we could
introduce `Trait(A, B) -> C` as syntactic sugar for `Trait<(A,B),C>`
rather than retaining the form `|A,B| -> C`. This is appealing but
removes the correspondence between the expression form and the
corresponding type. One (somewhat open) question is whether there will
be additional traits that mirror fn types that might benefit from this
more general sugar.

**Tweak trait names.** In conjunction with the above, there is some
concern that the type name `fn(A) -> B` for a bare function with no
environment is too similar to `Fn(A) -> B` for a closure. To remedy
that, we could change the name of the trait to something like
`Closure(A) -> B` (naturally the other traits would be renamed to
match).

Then there are a large number of permutations and options that were
largely rejected:

**Only offer by-value closures.** We tried this and found it
required a lot of painful rewrites of perfectly reasonable code.

**Make by-reference closures the default.** We felt this was
inconsistent with the language as a whole, which tends to make "by
value" the default (e.g., `x` vs `ref x` in patterns, `x` vs `&x` in
expressions, etc.).

**Use a capture clause syntax that borrows individual variables.** "By
value" closures combined with `let` statements already serve this
role. Simply specifying "by-reference closure" also gives us room to
continue improving inference in the future in a backwards compatible
way. Moreover, the syntactic space around closures expressions is
extremely constrained and we were unable to find a satisfactory
syntax, particularly when combined with self-type annotations.
Finally, if we decide we *do* want the ability to have "mostly
by-value" closures, we can easily extend the current syntax by writing
something like `(ref x, ref mut y) || ...` etc.

**Retain the proc expression form.** It was proposed that we could
retain the `proc` expression form to specify a by-value closure and
have `||` expressions be by-reference. Frankly, the main objection to
this is that nobody likes the `proc` keyword.

**Use variadic generics in place of tuple arguments.** While variadic
generics are an interesting addition in their own right, we'd prefer
not to introduce a dependency between closures and variadic
generics. Having all arguments be placed into a tuple is also a
simpler model overall. Moreover, native ABIs on platforms of interest
treat a structure passed by value identically to distinct
arguments. Finally, given that trait calls have the "Rust" ABI, which
is not specified, we can always tweak the rules if necessary (though
their advantages for tooling when the Rust ABI closely matches the
native ABI).

**Use inference to determine the self type of a closure rather than an
annotation.** We retain this option for future expansion, but it is
not clear whether we can always infer the self type of a
closure. Moreover, using inference rather a default raises the
question of what to do for a type like `|int| -> uint`, where
inference is not possible.

**Default to something other than `&mut self`.** It is our belief that
this is the most common use case for closures.

# Transition plan

TBD. pcwalton is working furiously as we speak.

# Unresolved questions

## Closures that are quantified over lifetimes

A separate RFC is needed to describe bound lifetimes in trait
references. For example, today one can write a type like `<'a> |&'a A|
-> &'a B`, which indicates a closure that takes and returns a
reference with the same lifetime specified by the caller at each
call-site. Note that a trait reference like `Fn<(&'a A), &'a B>`,
while syntactically similar, does *not* have the same meaning because
it lacks the universal quantifier `<'a>`. Therefore, in the second
case, `'a` refers to some specific lifetime `'a`, rather than being a
lifetime parameter that is specified at each callsite. The high-level
summary of the change therefore is to permit trait references like
`<'a> Fn<(&'a A), &'a B>`; in this case, the value of `<'a>` will be
specified each time a method or other member of the trait is accessed.

[p]: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focusing-on-ownership/

0 comments on commit 620a173

Please sign in to comment.