Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unboxed closures #77

Closed
wants to merge 1 commit into from
Closed

unboxed closures #77

wants to merge 1 commit into from

Conversation

thestinger
Copy link

No description provided.

@pcwalton
Copy link
Contributor

+1, I think we want something like this. I'm gathering some statistics now that will better inform our decision.

@sfackler
Copy link
Member

My gut says we'd really want some form of capture lists to handle the by value vs by ref issue. I'm not sure how often it'll actually be necessary to manually specify a by-ref capture though, so maybe we can revisit later if it becomes a problem.

@thestinger
Copy link
Author

I struggled to come up with an unambiguous way of defining capture lists, and I think any approach would be very ugly with the current closure syntax. Since references are just regular values, you can capture by-reference without using capture sugar at all, so I don't think a capture list would make things significantly clearer. The closures proposed here are purely sugar for simple cases, so it should be as painless as possible to write them. An explicit struct implementing the closure "trait" can be used in place of the sugar if it makes the code clearer (many captures and/or lots of code).

It's not perfect, but ref and ref mut lack ambiguity and I think it would be a sane reuse of the syntax used in patterns. It only has to feel like an orthogonal feature to make learning the language easier, even if the implementation for patterns and closure captures is entirely unrelated. This is the philosophy used in many places for the Python syntax, and it makes the language easier to learn and remember despite not being nearly as simple as it feels.

In C++, capture lists are the defining characteristic of the closure syntax, because the simplest closure ([] {}) is an empty capture list ([]) and empty body. C++ can't capture references by-value, so it's a more accurate list of anything captured by-reference. C++ also has sugar for capturing everything by-value ([=]) or everything by-reference ([&]).

Capture lists may be intended to make capturing many values more clear, but I find that I always stop listing out the captures by hand where there are more than 2 because it's less painful to just capture anything that's referenced. I think there's a fine line between a helpful feature (dare I say mut) and one simply causing aggravation with little return. In C++ a by-reference capture is also dangerous but Rust prevents it from becoming dangling.

currently require a concrete type signature so this proposal alone is not enough to return them. The
restriction on functions could be relaxed to permit the return type to be an anonymous type
implementing a trait, or in other words an "unboxed trait object". This is future work for another
proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, couldn't this be done without unboxed trait objects? A function that returns a closure isn't referring to an unboxed trait object but instead referring to a specific anonymous struct. The function's implementation would then use this anonymous struct as the type for whatever closure is actually returned. The two big downsides I could see are:

  1. You can't conditionally pick from several closures to return, and doing so would probably cause a very confusing error (because only one can actually be the correct type).
  2. I'm not sure how this would work if the function is generic. Does the anonymous struct then become generic too, using the same type parameters as the function? What if the function isn't generic, but it's part of a generic impl? This seems potentially problematic.

I'm not suggesting this is the way to go. Unboxed trait objects seem like a better solution. I just wanted to mention it as a potential alternative.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just using unboxed trait objects as jargon to refer to an anonymous type implementing some trait. Implementing it with sane error messages seems like it would go a long way to implementing this as a general purpose feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering my suggestion to actually be the inverse of rust-lang/rust#11455. In that proposal, a function returning an unboxed Iterator would actually return the concrete type of whatever iterator the implementation returns. I was thinking of this as the opposite, the return type would define the anonymous type and the closure in the implementation would be forced to be that same type.

But it turns out that's rather unnecessary (and overly complicated, because the function doesn't know what upvars there will be so it can't actually fully define the type). Because treating it the exact same way that the Iterator case is works just as well. So forget I suggested anything.

@Valloric
Copy link

+1 on the general idea; the by-ref capture syntax is awkward, but no better approach is readily apparent.

@pcwalton
Copy link
Contributor

Here are some statistics on closures (click through the charts at the bottom to see charts): https://docs.google.com/spreadsheets/d/1AMNPHhUxa7HKnoPAUD0Lj_fHxsScEa9rfWgK3JLYSjM/edit?usp=sharing

@pcwalton
Copy link
Contributor

Out of 6,145 closures, 4,654 (75.7%) contain no or only immutable, copyable borrows, and would therefore have identical semantics if upvars were moved by default (excluding Cell, but I suspect that's very few if any).

Most of the rest (19.2% of the total, 79.0% of the rest) contain at least one unique-immutable borrow. I have not yet analyzed for how many of them those uniquely-immutably-borrowed variables are used after the construction of the closure.

@lilyball
Copy link
Contributor

Overall I'm very strongly in favor of this proposal.

I was initially very against the (ref x) sugar when it was proposed in IRC. Now that it's been a few hours, I'm less opposed to it. I still think we should consider capture lists, but we can move ahead on this without the sugar (or with the proposed sugar but feature-gated) while the actual syntax is bikeshedded. And if we end up with (ref x) I suspect I'll come around to thinking it's reasonable.

One question I have about this is the traits for call. It appears you're defining sugar for a collection of traits, one for each possible function signature, which all define the call() method. That seems like a clever way to deal with this, but I have two questions:

  1. Can I use this same sugar on my own types? It seems that if we're defining an un-sugared version of the closure, then I should be able to write the unsugared version directly. This opens the door to user-defined types that are callable. And I'm ok with that, I just want to know if that's the intent.
  2. What happens if I implement two of these sugared traits on the same type? Does that throw an ambiguity error when trying to call the type, or does it actually allow for overloading? I hope it's the ambiguity (as we have no precedent for overloading in Rust), but it would be nice to have that defined.

@thestinger
Copy link
Author

Can I use this same sugar on my own types? It seems that if we're defining an un-sugared version of the closure, then I should be able to write the unsugared version directly. This opens the door to user-defined types that are callable. And I'm ok with that, I just want to know if that's the intent.

Yes, I want it to be possible for user-defined types to implement the trait(s) via this sugar.

What happens if I implement two of these sugared traits on the same type? Does that throw an ambiguity error when trying to call the type, or does it actually allow for overloading? I hope it's the ambiguity (as we have no precedent for overloading in Rust), but it would be nice to have that defined.

If it has the same number of an arguments, it will report an error for a duplicate implementation of the trait. If it has a different number of arguments, it will report an error when you attempt to call it. This is already implemented as an error message when two traits are in scope implementing a method with the same name, and using a different number of arguments can be considered an implementation of a separate trait.

If variadic generics ever exist, it could be mapped to a single trait rather than multiple traits but it's not necessary and the sugar would still be how it's used, so it would only influence the error messages.

@lilyball
Copy link
Contributor

If it has the same number of an arguments, it will report an error for a duplicate implementation of the trait. If it has a different number of arguments, it will report an error when you attempt to call it.

What about the same number of arguments, but different types?

In any case, it sounds like you're saying "it would be ambiguous", which is what I wanted to hear.

@thestinger
Copy link
Author

@kballard: If it's the same number of arguments with different types, the compiler would reject it as a duplicate implementation of the trait via the coherence rules.

@huonw
Copy link
Member

huonw commented May 16, 2014

@pcwalton what is a "unique-immutable" capture?

@thestinger
Copy link
Author

I took that to mean a capture of an affine (non-Copy) type.

@huonw
Copy link
Member

huonw commented May 16, 2014

Wouldn't that be covered by "immutable, noncopyable" and "mutable, noncopyable"?

@thestinger
Copy link
Author

Ah, then I don't know what it is either.

@bstrie
Copy link
Contributor

bstrie commented May 16, 2014

I love love love that this proposal avoids forcing us to re-introduce capture clauses. It's important to me that lambda syntax be very lightweight for the common case, a property that mandatory capture clauses would obliterate.

@pcwalton, what code did you analyze to produce those numbers, and what tool did you use to generate them?

@huonw
Copy link
Member

huonw commented May 16, 2014

Random thought: we can possibly support C++14 generic lambdas relatively naturally. We currently have the <'a> |&'a int| -> &'a int syntax for specifying lifetimes used in a closure's arguments... this could be generalised to a type like <'a, T: TotalOrd> |&'a T, &'a T| -> &'a T (for example) for generically instantiable lambdas.

I'm mainly writing this here so that it's recorded somewhere, and because it seems reasonable to avoid closing off this path, if possible. (If there is interest in discussing this outside of what's relevant to this RFC I'll make a post on /r/rust or the mailing list, to avoid this RFC being sidetracked.)

@pcwalton
Copy link
Contributor

@thestinger @huonw Unique-immutable means that &mut was captured when &mut was not in a mutable location, so the borrow checker was forced to consider it an &uniq capture. See Niko's blog post for more details.

I love love love that this proposal avoids forcing us to re-introduce capture clauses. It's important to me that lambda syntax be very lightweight for the common case, a property that mandatory capture clauses would obliterate.

@bstrie I'm a bit concerned about is having to write let mut foo = &mut bar; before as many as 19% of closures, because of unique-immutable captures. We may need capture clauses for this, because not having them will lead to verbosity for over a thousand. Note that I am not sure for what fraction of those closures moving the &mut in would be sufficient.

@thestinger
Copy link
Author

@pcwalton: I don't understand when moving or re-borrowing wouldn't be enough when it's being captured by-value. I think using ref and ref mut for captures is better sugar, relative to using an ugly capture list. If there are so many captures and so much code that it becomes unclear what's going on, it's probably a bad use case for a closure.

@pcwalton
Copy link
Contributor

Yes, Niko told me about the reborrowing option for &mut. I would like to
do more analysis on when reborrowing would be sufficient for the closures
we have.

On Friday, May 16, 2014, Daniel Micay [email protected] wrote:

@pcwalton: I don't understand when moving or re-borrowing wouldn't be
enough when it's being captured by-value. I think using ref and ref mut for
captures is better sugar, relative to using an ugly capture list. If there
are so many captures and so much code that it becomes unclear what's going
on, it's probably a bad use case for a closure.


Reply to this email directly or view it on GitHub.<
https://ci3.googleusercontent.com/proxy/ASED9izqeqb8X8BikCLeDVqYH7cC2euCObF8NI3vnOOvIVawmAaDDx4OAGVSlZlFyDS1Q3STE8U3TsQDAyHKfx2V2oVD1uY12sRmTbOvFa-9oQ6KZdPKP-oonrsXaf9vy-7LJU4pohM2f4LHIEabojI3vUwjeswk7NdyJH8stb1MVNfoAluVqjp2wnKeA9Om7vajQVuVW1b4f-Q3QOxv0XikeDGIjuSwQ3tUYM_zcH7bsOp_FdpG6PrpJE4Z2Lt3tBnpjzHqnyi7sCD3zYyeSO-LEikE=s0-d-e1-ft#https://github.com/notifications/beacon/157897__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxNTg3OTc1MiwiZGF0YSI6eyJpZCI6MzIzNzIxNTJ9fQ==--abf21fe1d5c2cd4e52e8ae9bf7905e15f1685459.gif

@bstrie
Copy link
Contributor

bstrie commented May 16, 2014

Can someone elaborate on reborrowing in this context?

@thestinger
Copy link
Author

&mut *x rather than moving it by-value, the sugar for by-mutable-reference captures can do this (or it can be done by hand)

@lilyball
Copy link
Contributor

@pcwalton My understanding is that a &uniq capture of a &mut reference is equivalent to re-borrowing the &mut reference (in terms of observable effects; obviously the closure is implemented a bit differently and can skip dereferencing the wrapped &uniq). Which is to say, I believe that all the unique-immutable captures can be handled with the proposed (ref mut x) syntax.

@lilyball
Copy link
Contributor

@thestinger

If it's the same number of arguments with different types, the compiler would reject it as a duplicate implementation of the trait via the coherence rules.

Ok, so it sounds like this sugar is mapping to an infinite series of

trait<R> Callable0<R> { fn call(&mut self) -> R; }
trait<R,A1> Callable1<R,A1> { fn call(&mut self, A1) -> R; }
trait<R,A1,A2> Callable2<R,A1,A2> { fn call(&mut self, A1, A2) -> R; }
// etc

which makes sense. I was interpreting each |A1, A2| -> R sugar as being a different trait instead of the same one using generics.

@bstrie
Copy link
Contributor

bstrie commented May 16, 2014

Looking at the categories of captures in pcwalton's spreadsheet, can someone confirm how each would map to the capture syntax proposed here? My best intuition:

Immutable, copyable            x  // value is copied into the struct
Immutable, noncopyable         x  // value is moved into the struct
Mutable, copyable              ???
Mutable, noncopyable           ???
Unique-immutable, noncopyable  (ref mut x)

Also, are the parens in the latter case mandatory for disambiguation?

@lilyball
Copy link
Contributor

Immutable, copyable            x  // value is copied into the struct

Correct. Or if the value is sufficiently large, you can use (ref x) to capture it by-reference.

Immutable, noncopyable         x  // value is moved into the struct

If moving is acceptable, then that's correct. Otherwise, you can use (ref x) to capture it by-reference.

Mutable, copyable              ???

x will copy it into the struct. (ref x) will capture it by-reference, if you want to elide the copy and don't need mutability. (ref mut x) will capture it by-mut-reference.

Mutable, noncopyable           ???

x will move it into the struct. (ref x) will capture it by-reference, if you don't want to move it and don't need mutability. (ref mut x) will capture it by-mut-reference.

Unique-immutable, noncopyable  (ref mut x)

Yeah. Or you can just use (ref x) if you don't actually need to mutate it in the closure.

Also, are the parens in the latter case mandatory for disambiguation?

I believe it's unambiguous without them, but I also think it's better code style to use them. So I would imagine they wouldn't actually be required by the parser, but we'd still use them in code examples.

@lilyball
Copy link
Contributor

For reference, if you want to reproduce the exact same behavior we have today, then you'd use either (ref x) or (ref mut x) for all captures (depending on whether you need mutability, including in the Unique-immutable case while capturing &mut references by-reference). But by-value is a better choice in a lot of these cases.

@thestinger
Copy link
Author

The parentheses wouldn't always be required, I just used them because my example wasn't as trivial as foo(ref x).

@bstrie
Copy link
Contributor

bstrie commented May 16, 2014

@kballard, thanks. I was just trying to get an intuition for how many upvars would opt out of by-value capturing, though since it's not really a 1:1 mapping I suppose it doesn't really answer my question (but is informative regardless).

@bstrie
Copy link
Contributor

bstrie commented May 16, 2014

One more clarification regarding the syntax: presumably, after capturing by reference, you would not require the capture syntax on subsequent uses of the variable? i.e.

let mut v = Vec::new(); let c = || { (ref mut v).push(3); v.push(6); }

...instead of:

let mut v = Vec::new(); let c = || { (ref mut v).push(3); (ref mut v).push(6); }

@thestinger
Copy link
Author

@bstrie: You would need to repeat ref mut it if you're not simply assigning it to let and reusing it, although that could be changed in the RFC. At the moment that would cause an attempt to move from a borrowed variable. I think it's clearer that way.

@bstrie
Copy link
Contributor

bstrie commented May 16, 2014

I can live with repeating the capture type. More verbose, but less magical, so it's a wash. And let is a fine workaround if you find yourself repeating it too often.

@alexcrichton
Copy link
Member

cc rust-lang/rust#12831

@lilyball
Copy link
Contributor

I prefer repeating. It allows expressing both a by-value and a by-ref capture in the same closure, which is theoretically useful (well, by-mut-ref is more theoretically useful in this scenario).

let x = ref x; is a reasonable thing to put at the top of the closure if you're going to reference the variable a lot.

@nrc
Copy link
Member

nrc commented May 17, 2014

I broadly like the proposal and agree capture clauses are pretty ugly. However, I really dislike annotating actual variables with properties of the formals, even if the formals are implicit. This is a big change from anything currently in Rust or (afaik) any other mainstream programming language. It seems confusing as well as not scaling beyond the smallest closures.

There was a suggestion floating around (I think Niko's) that we make closures capture all arguments by value by default, we have some sugary syntax for a closure which captures all its arguments by reference (of course the defaults could be reversed if desired), and for the remaining case we use the by-value form and you have to explicitly create references in the caller or to use capture clauses. I'm not sure about the details of that, but it sits much better with me.

@thestinger
Copy link
Author

It seems confusing as well as not scaling beyond the smallest closures.

I think large closures venture outside of the use case for closure syntax. The use case for this syntax is defining small callable objects with state inline with the body of another function. If it's a large amount of code, then it makes more sense to define it outside of the function as a struct and explicit method implementation. It only requires a few extra lines.

we have some sugary syntax for a closure which captures all its arguments by reference (of course the defaults could be reversed if desired), and for the remaining case we use the by-value form and you have to explicitly create references in the caller or to use capture clauses.

This will lead to performing unwanted shallow copies (potentially large ones) or unwanted by-reference captures simply because it's the only clean way to do it. It will encourage writing inefficient code. The use of manual by-reference captures via let bindings before the closures introduces unnecessary new names outside of the closure, and the captures are still implicit without a capture list so it's strictly less clear.

@thestinger
Copy link
Author

You can also do let ref_x = ref x to reuse the capture inside the closure to avoid writing ref more than once. Unlike defining ref_x outside of the closure and capturing it, the scope of the variable is limited. Limiting the scope of variables reduces the state someone reader the code has to watch out for. It has a comparable readability benefit to making mut for local variables opt-in. The true non-sugared equivalent to the proposal for allowing this usage of ref is defining the by-reference captures before the closure and then wrapping it all in another layer of indentation.

@bstrie
Copy link
Contributor

bstrie commented May 17, 2014

I agree with @thestinger, sprawling closures are an anti-pattern and should be discouraged.

However, I really dislike annotating actual variables with properties of the formals, even if the formals are implicit. This is a big change from anything currently in Rust or (afaik) any other mainstream programming language.

My mental model of closures doesn't consider upvars to be formal parameters, and I'm willing to venture this is how most other people conceptualize it as well. Capture clauses themselves are already far removed from most programming languages (aside from C++, I can't think of any other language, niche or mainstream, that has syntax to allow you to determine the manner in which upvars are captured).

There was a suggestion floating around (I think Niko's)

I would like to read this proposal, but at first blush it seems too coarse-grained to force a choice between 1) every upvar by-val, 2) every upvar by-ref, 3) a mixture by making manual bindings outside the function. pcwalton could weigh in here by giving us stats on what percentage of closures have at at least one by-val and one by-ref upvar.

@pcwalton
Copy link
Contributor

I feel like we may not need any syntax for upvar captures. Fewer than 10% of closures in both Rust and Servo use anything other than the default, assuming we reborrow &mut.

@thestinger was concerned about people accidentally capturing large structs by value, but I bet it's rare. I'll do some analysis to see how big the set of upvars is.

@alexcrichton
Copy link
Member

Would the desugaring described allow for movement of owned captured variables into the closure? For example, today this compiles with an error:

let a = box 1;
(|| drop(a))()

With this proposal, the closure would capture a by value, meaning it moves into the closure. With the desugaring mentioned it sounds like the compiler would disallow this because you're moving out of an &mut pointer (the &mut self pointer). This may allow what the old ~fn did which was capturing by value, but only allowing usage by reference. Is that intended? I found those semantics a little confusing, but they do seem occasionally useful perhaps.

@lilyball
Copy link
Contributor

@alexcrichton I believe you are correct. You can move the box 1 into the closure, but you can't drop() it because any reference to the upvar is implicitly accessed through a &mut environment.

It doesn't make sense to be able to drop() it anyway. Closures are not run-once. If you could drop it, the next execution of the closure would then be accessing a dropped value. You could, however, wrap it in Option and use .take().

@huonw
Copy link
Member

huonw commented May 17, 2014

Closures are not run-once

Unboxed closures can be run-once, if the "callable" trait they implement takes self (or Box<self>).

(The FnOnce in this table.)

regular `call` method can not safely move out of the captured environment. A separate trait would be
required, perhaps using the reserved `once` keyword as a prefix to the closure type sugar. It could
also be done via a future implementation of variadic generics without the sugar, but it would be
significantly uglier and would still be a magical lang item.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really different enough that it deserves to be deferred for a future RFC? Because if you're just looking to avoid a syntax bikeshed...

spawn(proc(x) { ... });  // old
spawn(once |x| { ... });  // new

...then IMO this looks like it would be an improvement. Still not quite as pretty as the old do spawn, but I was never comfortable with how non-closurelike the proc syntax looks. :P

@lilyball
Copy link
Contributor

@huonw Ok sure, but the run-once variant uses a by-value self, so you're not accessing the field through &mut. So it all just works.

@huonw
Copy link
Member

huonw commented May 17, 2014

@kballard ... that's exactly my point. Generalised closures aren't forced to have &uniq/&mut environment access, so it all just works. :)

@pcwalton
Copy link
Contributor

I did some analysis on the size of upvars being captured. Results are here:

https://docs.google.com/spreadsheets/d/1AMNPHhUxa7HKnoPAUD0Lj_fHxsScEa9rfWgK3JLYSjM/edit?usp=sharing

The tl;dr is that, on a 64-bit machine in the Rust compiler and standard libraries, 95.5% of upvars are 16 bytes or fewer, 97.2% of upvars are 32 bytes or fewer, and 98.2% of upvars are 64 bytes or fewer.

@huonw
Copy link
Member

huonw commented May 18, 2014

Did you calculate those percentages correctly? That data gives me even better results:

  • <= 16 bytes: 96.6%
  • <= 32 bytes: 98.3%
  • <= 64 bytes: 99.5%

@thestinger
Copy link
Author

@alexcrichton: You can move into the closure, but it's not possible to move out via call. In the future we could introduce a once version of the same closure trait(s) to replace proc but I'm unsure about the details.

@Kimundi
Copy link
Member

Kimundi commented May 18, 2014

I like this proposal, for all the reasons @thestinger has mentioned.

In Rust, working by-value is simply the most composable option, and with the optional ref and ref mut sugar, the pain point of capturing references by-value manually is being reduced to a annotation that is as simple as it gets without removing the choice from the programmer, and that is easy to explain by desugaring it.

@thestinger
Copy link
Author

This is now updated to treat &mut T as a special case by reborrowing instead of moving, along with dropping the ref and ref mut capture markers. @pcwalton gathered statistics showing that by-value captures will be by far the most common. Capturing by-reference will only involve extra work if there's not already a reference in-scope to capture by-value anyway.

@gereeter
Copy link

One alternative to the proposed by-reference capture system (the size of rust-lang/rust#14501 implies that this should not be thrown out) would be replacing ref with & and replacing ref mut with &mut. While this may be "too magical," it seems like it would be better in a number of ways:

  • A number of existing closures that want by-reference semantics already use this form, meaning that less code needs to be changed.
  • It should be immediately obvious what the code is doing, as the & operator doesn't change meaning. In contrast, this RFC adds a new meaning to the word ref.
  • It is much more lightweight than ref.
  • It intuitively makes sense that the way to ask for a reference to a variable in a close (even by capturing) is by the normal borrowing operator.

Downsides:

  • It is fairly "magical."
  • It introduces some idioms that in other contexts are nonsensical, like *&x or (&x).foo.

@Kimundi
Copy link
Member

Kimundi commented May 31, 2014

@gereeter: That would make the reference operators have different behavior depending on the context where they are written, which apart from being confusing would also prevent the user from being able to choose whether they want to capture by reference, or to capture by value and take a reference to that.

The advantage of the ref and ref mut form is that it doesn't conflate those things.

@alexcrichton
Copy link
Member

Closing in favor of the most recent unboxed closures RFC #114

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Drastically simplify with ideas from tokio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.