-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: use channels as iterators #48567
Comments
I think that is essential, but it isn't simple. Do you have any thoughts on how we can make it much faster? |
I'm not entirely sure, as I'm not very well versed in compiler optimizations (especially for synchronization). All I really know is that it is possible (unsure how feasible), because other compiled languages such as Kotlin and Rust have coroutine mechanisms which are able to implement the generator pattern in an efficient manner (see Kotlin Sequences and Rust Generators) In #43557 (comment), @carlmjohnson had noticed that the iteration example runs significantly faster with GOMAXPROCS=1, and suggested trying to detect when goroutines would only ever yield to each other, and using a simpler scheduler in those cases. |
Explicit coroutine mechanisms are significantly different than Go goroutines communicating over channels. I agree that if the compiler can reliably recognize the channel-as-iterator pattern when compiling the goroutine that sends the values on the channel, then we can compile that case differently and get significantly better performance. But that is a really big if. If we mistakenly think that an ordinary goroutine is a channel-generation goroutine, the program will deadlock. If we mistakenly think that a channel-generation goroutine is an ordinary goroutine, then performance will be much worse than expected. I don't think we can just wave our hands about this issue. One way to describe the difference between what Kotlin and Rust and Go are doing is that in Kotlin and Rust, we always know for sure that the values being generated are being consumed by the equivalent of Go's for/range loop. In Go we do not know that. |
I think one of the reasons that I wanted to do this proposal is just because we have multiple iterator patterns in the standard library ( One way that we could help be more clear that a channel-generation goroutine is in fact a channel-generation goroutine could be to use a function like |
I thought for a moment about somehow fusing the goroutine to the channel at creation time, but then I realized that “a channel fused to a goroutine” is structurally and semantically different from an ordinary channel, and “a goroutine fused to a channel” is structurally and semantically different from an ordinary goroutine. So I think what we're describing is, really, a new type — perhaps a |
@deanveloper, one thing missing from this description is the interaction between If the iterator is not drained, do its deferred function calls ever execute? If so, when? If not, I think the fact that they do not run would be surprising. Similarly, would the “goroutine” begin running before the caller tries to receive the first value, or only on demand? If it does run before the caller tries to receive the first value, we have the same off-by-one problem as with the |
I think part of the issue with a separate stream type, would be how similar they are to channels semantically. They would support the same operations and (essentially) function the same as a channel.
I had this typed out in the proposal, but I think I may have accidentally deleted it in a revision of it. Deferred functions should not execute, as garbage-collecting a forever-blocked goroutine should behave as if the goroutine is still blocked. This does have some implications, for instance, iterators would not be able to close file io operations, and the caller would have to open/close the file themselves. Note that this could be solved with the
My initial thought was that it would begin running before the caller tries to receive the first value. This would however result in the off-by-one issue. For most iterators this is not a big deal. But if the iterator has side effects, or if the data structure that the iterator is based on is modified, this may result in some strange behavior. Iterators (in general) shouldn't be having side effects though, as those should be saved for when the iterator is consumed. But I guess we can't guarantee that. |
I was thinking of an alternative proposal for this, where we introduce a new builtin, I'm not really sure what to name it (for now I guess we can call it (
iter.a<-struct{}{}
for v := range iter.b {
...
iter.a<-struct{}{}
} Just as channels, iterators are support the read-only (<-iterator T) and write-only (iterator<- T) type modifiers. An iterator would then look like: func IntRangeIter(from, to int) <-iterator int {
iter := make(iterator int)
go func() {
defer close(iter)
for i := from; i < to; i++ {
iter <- i
}
}()
return iter
}
func main() {
for i := range IntRangeIter(0, 5) {
fmt.Printf("%d, ", i)
}
fmt.Println()
}
// Output: 0, 1, 2, 3, 4, This would be a complex (and unorthogonal) solution to solve a relatively minor problem, though. I don't really like it, and it defeats the main purpose of this proposal, which would be to introduce iterators without language changes. |
Note that slices, arrays, and maps also all support the same operations. The ways that a stream differs from a channel seem at least as important to me as the ways in which an array differs from a slice. |
I don't think we are going to reconsider #19702 (comment). See that comment for details, which have not changed. If there are other optimizations that can be applied, that's always fine. But we're not going to GC blocked goroutines, and therefore we're not going to use channels as iterators in the standard library. Declining as infeasible. |
This proposal has been declined as infeasible. |
Establishing a standard for iteration
Related: #43557 and #47707
Problem
With the proposals for the
slices
andmaps
packages, as well as other proposals talking about generic data structures (iecontainer/list
andcontainer/set
), it has become clear that we need some pattern to iterate over structures.Common iteration patterns
There are two common iterator patterns from what I have seen.
The first is a pretty standard pattern, aptly named the "Iterator Pattern". This is a pattern such that you return an iterator which is repeatedly called over time, and the iterator will return the next value each time the function is called. This is the iteration pattern that #43557 has recommended for Go. One of the great benefits of this pattern is that it is standard among many languages. However, these iterators tend to be quite cumbersome because the writer of the iterator needs to manage state over time, which can be extremely difficult, especially when dealing with complex data structures like maps, or recursive data structures like trees.
The second pattern is a bit more modern and is typically named the "Generator Pattern". In this pattern, the iterator is a single function which is passed a "yield" function (or in many languages, a
yield
keyword which may be only be used in Generator Functions). Each timeyield
is called, control goes to the caller of the iterator, and then control is given back to the iterator again, untilyield
is called again, or until the iterator ends. The amazing thing about the generator pattern, is that the iterator is extremely easy to write, and it ends up looking almost like a solution where one would simply append to a slice (and return the slice in the end). This is the pattern which #47707 recommends.Using channels as iterators
In Go, we actually already have a form of iterating using the generator pattern: goroutines and channels. This can be done by making a function which returns a read-only channel. This function creates a goroutine, and sends values on the channel. It would look something like this, which is valid Go code today:
There are a few problems with this pattern in current Go. The main issue is that we must exhaust the iterator channel in order for the spawned goroutine to be destroyed. Otherwise, the goroutine in the iterator will simply hang on
ch <- i
forever.The second issue is performance. Channels in Go unfortunately have a lot of overhead. In my own testing, recursive iterators took ~500x longer to use channels to iterate over a binary tree (compared to calling a function on each element). Iterating over a slice instead of a tree, it took ~100x longer to use channels.
Optimizations can definitely be made though, as using Javascript's generator pattern (which also uses coroutines in some form) can iterate over the binary tree in ~10x longer than calling a function on each element in Go.
Proposal
This proposal has three parts:
For part 1, this would mean we take actions such as adding
Iter() <-chan T
forcontainer/list
andcontainer/set
.For part 2, we reconsider #19702 (such that cleanup is not done - the goroutine vanishes when it is GC'd). This allows us to spawn these goroutines, and the caller of the iterator does not need to communicate to the iterator that we are done iterating.
Part 3 is likely the least important of the three, but it is still extremely important. Currently, this pattern is two orders of magnitude slower than calling a function on each element. Languages (Kotlin, Rust) are already adopting this pattern and use a coroutine implementation, but do not see the same immense performance hits that Go does.
The wonderful thing about this update, is that nothing about the language itself needs to change. Tooling does not need to be updated, as channels are already range-able.
Example
Other solutions
Many solutions were discussed in #43557, and I highly recommend checking them out. Here are a couple other solutions that also involve solutions to allow using channels as iterators:
runtime.Deadlocked()
to tell if the current goroutine is deadlocked.This is my second favorite solution behind allowing goroutines to be GC'd.
Alternatively, this could be unexported, and imported (via //go:linkname) in something like
chans.Generator
. This would give the generator function ayield
function, which would select betweenruntime.deadlocked()
and sending on the iterator channel.chans.Generator
would look something like this:Finalizers are a bit hacky, but it somewhat solves our problem. There is a cool example that @ianlancetaylor shows in the generics draft about Rangers. Unfortunately, these are not actually
range
able, but it does show that we could possibly use a finalizer to clean up a goroutine.The text was updated successfully, but these errors were encountered: