From 7d06af4e66c90fe1676191b08ee51439de68dedf Mon Sep 17 00:00:00 2001 From: Jake Goulding Date: Thu, 12 Oct 2023 09:15:30 -0400 Subject: [PATCH] feedback (#2) * minor typos * intro * motivation * guide * reference * implementation * alternates * rationale * future * whitespace --- text/0000-gen-fn.md | 171 +++++++++++++++++++++++--------------------- 1 file changed, 91 insertions(+), 80 deletions(-) diff --git a/text/0000-gen-fn.md b/text/0000-gen-fn.md index 01d8c7ef430..3478a2990b5 100644 --- a/text/0000-gen-fn.md +++ b/text/0000-gen-fn.md @@ -6,11 +6,12 @@ # Summary [summary]: #summary -Add `gen {}` blocks to the language. These blocks implement `Iterator` and -enable writing iterators in regular code by `yield`ing elements instead of having -to implement `Iterator` for a custom struct and manually writing an `Iterator::next` -method body. This is a change similar to adding `async {}` blocks that implement -`Future` instead of having to manually write futures and their state machines. +Add `gen {}` blocks to the language. These implement `Iterator` by `yield`ing +elements. This is simpler and more intuitive than creating a custom type and +manually implementing `Iterator` for that type, which requires writing an +explicit `Iterator::next` method body. This is a change similar to adding `async +{}` blocks that implement `Future` instead of having to manually write futures +and their state machines. Furthermore, add `gen fn` to the language. `gen fn foo(arg: X) -> Y` desugars to `fn foo(arg: X) -> impl Iterator`. @@ -19,19 +20,19 @@ Furthermore, add `gen fn` to the language. `gen fn foo(arg: X) -> Y` desugars to [motivation]: #motivation The main motivation of this RFC is to reserve a new keyword in the 2024 edition. -The feature used by this keyword described here should be treated as an e-RFC for -experimentation on nightly with this new keyword. I would like to avoid too much -discussion of the semantics provided here, and instead discuss the semantics during -the experimental implementation work. +The feature used by the keyword described here should be treated as an e-RFC for +experimentation on nightly. I would like to avoid discussion of the semantics +provided here, deferring that discussion until during the experimental +implementation work. Writing iterators manually can be very painful. Many iterators can be written by chaining `Iterator` methods, but some need to be written as a `struct` and have -`Iterator` implemented for them. Some of the code that is written this way pushes -people to instead not use iterators, but just run a `for` loop and write to mutable -state. With this RFC, you could write the `for` loop, without mutable state, and get -an iterator out of it again. +`Iterator` implemented for them. Some of the code that is written this way +pushes people to avoid iterators and instead execute a `for` loop that eagerly +writes values to mutable state. With this RFC, one can write the `for` loop +and still get a lazy iterator of values. -As an example, here are three ways to write an iterator over something that contains integers, +As an example, here are multiple ways to write an iterator over something that contains integers, only keep the odd integers, and multiply all of them by 2: ```rust @@ -39,6 +40,7 @@ only keep the odd integers, and multiply all of them by 2: fn odd_dup(values: impl Iterator) -> impl Iterator { values.filter(|value| value.is_odd()).map(|value| value * 2) } + // `struct` and manual `impl` fn odd_dup(values: impl Iterator) -> impl Iterator { struct Foo(T); @@ -55,6 +57,7 @@ fn odd_dup(values: impl Iterator) -> impl Iterator { } Foo(values) } + // `gen block` fn odd_dup(values: impl Iterator) -> impl Iterator { gen { @@ -77,7 +80,7 @@ gen fn odd_dup(values: impl Iterator) -> u32 { ``` Iterators created with `gen` return `None` once they `return` (implicitly at the end of the scope or explicitly with `return`). -See [#unresolved-questions] for whether `gen` iterators are fused or may behave strangely after having returned `None` once. +See [the unresolved questions][#unresolved-questions] for whether `gen` iterators are fused or may behave strangely after having returned `None` once. Under no circumstances will it be undefined behavior if `next` is invoked again after having gotten a `None`. # Guide-level explanation @@ -85,15 +88,15 @@ Under no circumstances will it be undefined behavior if `next` is invoked again ## New keyword -Starting in the 2024 edition, `gen` is a keyword that cannot be used for naming any items or bindings. This means during the migration to the 2024 edition, all variables, functions, modules, types, ... named `gen` must be renamed. +Starting in the 2024 edition, `gen` is a keyword that cannot be used for naming any items or bindings. This means during the migration to the 2024 edition, all variables, functions, modules, types, ... named `gen` must be renamed. ## Returning/finishing an iterator -`gen` blocks' trailing expression must be of unit type or the block must diverge before reaching its end. +`gen` block's trailing expression must be of the unit type or the block must diverge before reaching its end. ### Diverging iterators -For example, an `gen` block that produces the sequence `0, 1, 0, 1, 0, 1, ...`, will never return `None` +For example, a `gen` block that produces the infinite sequence `0, 1, 0, 1, 0, 1, ...`, will never return `None` from `next`, and only drop its captured data when the iterator is dropped. ```rust @@ -105,34 +108,36 @@ gen { } ``` -If an `gen` panics, the behavior is very similar to `return`, except that `next` doesn't return `None`, but unwinds. +If a `gen` block panics, the behavior is very similar to `return`, except that `next` unwinds instead of returning `None`. ## Error handling Within `gen` blocks, the `?` operator desugars differently from how it desugars outside of `gen` blocks. Instead of returning the `Err` variant, `foo?` yields the `Err` variant and then `return`s immediately afterwards. -This has the effect of it being an iterator with `Iterator::Item`'s type being `Result`, and once a `Some(Err(e))` -is produced via `?`, the iterator returns `None` next. +This creates an iterator with `Iterator::Item`'s type being `Result`. +Once a `Some(Err(e))` is produced via `?`, the iterator returns `None` on the subsequent call to `Iterator::next`. -`gen` blocks do not need to have a trailing `Ok(x)` expression, because returning from an `gen` block will make the `Iterator` return `None` from now, which needs no value. Instead all `yield` operations must be given a `Result`. +`gen` blocks do not need to have a trailing `Ok(x)` expression. +Returning from a `gen` block will make the `Iterator` return `None`, which needs no value. +Instead, all `yield` operations must be given a `Result`. -Similarly the `?` operator on `Option`s will `yield None` if it is `None`, and require passing an `Option` to all `yield` operations. +The `?` operator on `Option`s will `yield None` if it is `None`, and require passing an `Option` to all `yield` operations. ## Fusing -Just like `Generators`, Iterators produced by `gen` panic when invoked again after they have returned `None` once. -This can probably be fixed by special casing the generator impl if `Generator::Return = ()`, as we can trivally -produce infinite values of `()` type. +Like `Generators`, `Iterator`s produced by `gen` panic when invoked again after they have returned `None` once. +This can probably be fixed by special casing the generator impl if `Generator::Return = ()`, as we can trivially +produce infinite values of the unit type. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation ## New keyword -In the 2024 edition we reserve `gen` as a keyword. Previous editions need to use `k#gen` to get the same features. +In the 2024 edition we reserve `gen` as a keyword. Previous editions will use `k#gen` to get the same features. ## Error handling -`foo?` in `gen` blocks desugars to +`foo?` in `gen` blocks will stop iteration after the first error by desugaring to ```rust match foo.branch() { @@ -144,32 +149,33 @@ match foo.branch() { } ``` -which will stop iteration after the first error. This is the same behaviour that `collect::>()` performs -on any iterator over `Result`s +This is the same behaviour that `collect::>()` performs +on iterators over `Result`s ## Implementation -This feature is mostly implemented via existing generators, we'll just need some desugarings and then lots of work to get good diagnostics. +This feature is mostly implemented via existing generators. +We'll need additional desugarings and lots of work to get good diagnostics. -### Gen fn +### `gen fn` -`gen fn` desugars to the function itself, with its return type replaced by `impl Iterator` and its body wrapped in a `gen` block. -So a `gen fn`'s "return type" is in fact its iterator's `yield` type. +`gen fn` desugars to the function itself with the return type replaced by `impl Iterator` and its body wrapped in a `gen` block. +A `gen fn`'s "return type" is its iterator's `yield` type. A `gen fn` captures all lifetimes and generic parameters into the `impl Iterator` return type (just like `async fn`). -If you want more control over your captures, you'll need to use type alias impl trait when that becomes stable. +If more control over captures is needed, type alias impl trait can be used when it is stabilized. -Just like all other uses of `impl Trait`, auto traits are revealed without being specified. +Like other uses of `impl Trait`, auto traits are revealed without being specified. -### Gen blocks +### `gen` blocks -`gen` blocks are effectively the same as an unstable generator +`gen` blocks are the same as an unstable generator * without arguments, * with an additional check forbidding holding borrows across `yield` points, * and an automatic `Iterator` implementation. -We'll probably be able to modularize the generator impl and make it more robust (on the impl and diagnostics side) for the `gen` block case, but I believe the initial implementation should just be a HIR lowering to a generator and wrapping that generator in `std::iterator::from_generator`. +We'll probably be able to modularize the generator implementation and make it more robust on the implementation and diagnostics side for the `gen` block case, but I believe the initial implementation should be a HIR lowering to a generator and wrapping that generator in [`from_generator`][]. # Drawbacks [drawbacks]: #drawbacks @@ -177,19 +183,21 @@ We'll probably be able to modularize the generator impl and make it more robust It's another language feature for something that can already be written entirely in user code. In contrast to `Generator`, `gen` blocks that produce `Iterator`s cannot hold references across `yield` points. -See also https://doc.rust-lang.org/std/iter/fn.from_generator.html, which has an `Unpin` bound on the generator it takes -to produce an `Iterator`. +See [`from_generator`][] which has an `Unpin` bound on the generator it takes to produce an `Iterator`. + +[`from_generator`]: https://doc.rust-lang.org/std/iter/fn.from_generator.html # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives ## Keyword -We could also use `iter` as a keyword. I would prefer `iter` in because I connect generators with a more powerful -scheme than just plain `Iterator`s. The `Generator` trait can do everything that `iter` blocks and `async` blocks can do, and more. I believe connecting the `Iterator` -trait with `iter` blocks is the right choice, but that would require us to carve out many exceptions for this keyword, -as `iter` is used for module names and method names everywhere (including libstd/libcore). It may not be much worse than `gen` (see also [#unresolved-questions]). - -One argument for `iter` is also that we may want to use `gen` for full on generators in the future. +We could use `iter` as the keyword. +I prefer `iter` because I connect generators with a more powerful scheme than plain `Iterator`s. +The `Generator` trait can do everything that `iter` blocks and `async` blocks can do and more. +I believe connecting the `Iterator` trait with `iter` blocks is the right choice, +but that would require us to carve out many exceptions for this keyword as `iter` is used for module names and method names everywhere (including libstd/libcore). +It may not be much worse than `gen` (see also [the unresolved questions][#unresolved-questions]). +We may want to use `gen` for full on generators in the future. ## 2021 edition @@ -198,19 +206,22 @@ We can allow `gen fn` on all editions. ## Do not do this -The alternative is to keep adding more helper methods to `Iterator`. It is already rather hard for new Rustaceans to get a hold of all the options they have on `Iterator`. -Some such methods would also need to be very generic (not an `Iterator` example, but https://doc.rust-lang.org/std/primitive.array.html#method.try_map on arrays is something -that has very complex diagnostics that are hard to improve, even if it's nice once it works). +One alternative is to keep adding more helper methods to `Iterator`. +It is already hard for new Rustaceans to be aware of all the capabilities of `Iterator`. +Some of these new methods would need to be very generic. +While it's not an `Iterator` example, [`array::try_map`][] is something that has very complex diagnostics that are hard to improve, even if it's nice once it works. -Users can use crates like [`genawaiter`](https://crates.io/crates/genawaiter) instead, which work on stable and give you `gen!` blocks that behave pretty mostly -like `gen` blocks, but don't have compiler support for nice diagnostics or language support for the `?` operator. +Users can use crates like [`genawaiter`](https://crates.io/crates/genawaiter) instead. +This crate works on stable and provides `gen!` macro blocks that behave like `gen` blocks, but don't have compiler support for nice diagnostics or language support for the `?` operator. + +[`array::try_map`]: https://doc.rust-lang.org/std/primitive.array.html#method.try_map ## `return` statements `yield` one last element -Similarly to `try` blocks, trailing expresisons could yield their element. +Similarly to `try` blocks, trailing expressions could yield their element. -But then have no way to terminate iteration, as `return` statements would similarly have to have a -value that needs to get `yield`ed before terminating iteration. +There would then be no way to terminate iteration as `return` statements would have to have a +value that is `yield`ed before terminating iteration. We could do something magical where returning `()` terminates the iteration, so @@ -238,15 +249,13 @@ gen fn foo() {} is supposed to be, as it could be either `std::iter::once(())` or `std::iter::empty::<()>()` - # Prior art [prior-art]: #prior-art ## Python -Python has `gen fn`: any function that uses `yield` internally. -These work pretty much like the `gen` functions proposed in this PR. The main difference is that raising an -exception automatically passes the exception outwards, instead of yielding an `Err()` element. +Python has equivalent functionality to `gen fn`: any function that uses `yield` internally. +The main difference is that raising an exception automatically passes the exception outwards, instead of yielding an `Err()` element. ```python def odd_dup(values): @@ -260,8 +269,8 @@ def odd_dup(values): ## Keyword -Should we use `iter` as a keyword instead, as we're producing `Iterator`s. -We can also use `gen` like proposed in this RFC and later extend its abilities to more powerful generators. +Should we use `iter` as the keyword, as we're producing `Iterator`s? +We could use `gen` as proposed in this RFC and later extend its abilities to more powerful generators. [playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=efeacb803158c2ebd57d43b4e606c0b5) @@ -282,24 +291,26 @@ fn main() { ## Panicking -What happens when a `gen` block that panicked gets `next` called again? Do we need to poison the iterator? +What happens when `Iterator::next` is called again on a `gen` block that panicked? Do we need to poison the iterator? ## Fusing -Should we make `gen` blocks fused? Right now they'd panic (which is what the generator impl does): +Should we make `gen` blocks fused? Right now they'd panic (which is what the generator implementation does): ## Contextual keyword -Popular crates (like `rand`) have methods called `gen` (https://docs.rs/rand/latest/rand/trait.Rng.html#method.gen). If we forbid those, we are forcing those crates to make a major version bump when they update their edition, and we are requiring any users of those crates to use `r#gen` instead of `gen` when calling that method. +Popular crates (like `rand`) have methods called [`gen`][Rng::gen]. If we forbid those, we are forcing those crates to make a major version bump when they update their edition, and we are requiring any users of those crates to use `r#gen` instead of `gen` when calling that method. -We could instead choose to use a contextual keyword and only forbid +We could choose to use a contextual keyword and only forbid `gen` in * bindings, * field names (due to destructuring bindings), * enum variants, * and type names -to be `gen`. This should avoid any parsing issues around `gen` followed by `{` in expressions. +This should avoid any parsing issues around `gen` followed by `{` in expressions. + +[Rng::gen]: https://docs.rs/rand/latest/rand/trait.Rng.html#method.gen # Future possibilities [future-possibilities]: #future-possibilities @@ -308,7 +319,7 @@ to be `gen`. This should avoid any parsing issues around `gen` followed by `{` i Python has the ability to `yield from` an iterator. Effectively this is syntax sugar for looping over all elements of the iterator and yielding them individually. -There are infinite options to choose from if we want such a feature, so I'm just going to list the general ideas below: +There are infinite options to choose from if we want such a feature, so I'm listing general ideas: ### Do nothing, just use loops @@ -318,7 +329,7 @@ for x in iter { } ``` -### language support +### Language support we could do something like postfix `yield` or an entirely new keyword, or... @@ -326,27 +337,28 @@ we could do something like postfix `yield` or an entirely new keyword, or... iter.yield ``` -### stlib macro +### stdlib macro -We could add a macro to the standard library and prelude, the macro would just expand to the for loop + yield. +We could add a macro to the standard library and prelude. +The macro would expand to a `for` loop + `yield`. ```rust yield_all!(iter) ``` -## Full on `Generator` support +## Complete `Generator` support -We already have a `Generator` trait on nightly that is much more powerful than the `Iterator` +We already have a `Generator` trait on nightly that is more powerful than the `Iterator` API could possibly be. 1. it uses `Pin<&mut Self>`, allowing self-references in the generator across yield points 2. it has arguments (`yield` returns the arguments passed to it in the subsequent invocations) -Similar (but def not the same) to ideas around `async` closures, I think we could argue for `Generators` to be `gen` closures, -while `gen` blocks are the simpler concept that has no arguments and just captures variables. +Similar to the ideas around `async` closures, +I think we could argue for `Generators` to be `gen` closures while `gen` blocks are a simpler concept that has no arguments and only captures variables. -Either way, support for full `Generator`s should (in my opinion) be discussed and implemented separately, -as there are many more open questions around them than around just a simpler way to write `Iterator`s. +Either way, support for full `Generator`s should be discussed and implemented separately, +as there are many more open questions around them beyond a simpler way to write `Iterator`s. ## `async` interactions @@ -355,8 +367,7 @@ This is not possible in general due to the fact that `Iterator::next` takes `&mu it should be possible if no references are held across the `await` point, similar to how we disallow holding references across `yield` points in this RFC. - -## self-referential `gen` bloocks +## self-referential `gen` blocks There are a few options forward: @@ -369,5 +380,5 @@ There are a few options forward: ## `try` interactions -We could allow `try gen fn foo() -> i32` to actually mean something akin to `gen fn foo() -> Result`. -Whatever we do here, it should mirror whatever `try fn` will mean in the future. +We could allow `try gen fn foo() -> i32` to mean something akin to `gen fn foo() -> Result`. +Whatever we do here, it should mirror whatever `try fn` means in the future.