-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor iteration logic in the Flatten
and FlatMap
iterators
#99541
Conversation
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @the8472 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general DRYing seems fine, but do the last()
and count()
impls provide any benefit? The default impls already use fold
.
bench_flat_map_ref_sum 1,942,139 ns/iter 2,216,643 ns/iter
This seems to be a slight regression. Since by-ref iterators don't forward internal iteration methods and rely on next()
instead does that mean next()
is now a bit less efficient?
Implementing next
by using folding over an inner iterator is unusual and potentially means more codegen as the try_fold
on adapters have a deeper callgraph compared to next implementations.
Maybe it'd help in cases where the outer iterator has to walk over tons of empty inner elements before it arrives at a non-empty one, but seems like an uncommon scenario when stepping with next()
[Edit: I see that's what bench_flat_map_chain_option_ref_sum
demonstrates]
Anyway, let's do a perf run since the compiler does use flatten
and flat_map
quite a bit
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit f269998e5d0b44dd9fd78d4573c4386f7d07ad93 with merge 06aa8c36743913f0893953065baf3bb61c5f0d29... |
☀️ Try build successful - checks-actions |
Queued 06aa8c36743913f0893953065baf3bb61c5f0d29 with parent 47ba935, future comparison URL. |
The default impls use
In this particular benchmark, yes. Not in the other two by-ref benchmarks. It's entirely possible that this one is more representative of real world code though.
This is primarily true when only a single call to the inner |
Finished benchmarking commit (06aa8c36743913f0893953065baf3bb61c5f0d29): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
So the perf numbers seem more negative than positive and it spends more time in codegen (even in debug) and in llvm opts. And the artifact sizes are larger too. I think it'd make sense to look at the llvm-lines output or the generated assembly. |
I think there's not much we can do here to improve this. If these perf results aren't deemed worth it for speeding up external iteration of flattened iterators where many of them are empty, then I think we should revert the changes to |
Sure, we can measure just the fold()-changes without the next() ones.
The microbenchmarks look worse for dense outer iterators, the perf results look worse on several metrics so I think we'd at least need some evidence that these kinds of iterators are common and that the speedups are worth it. And I think that is exactly the kind of scenario where |
871975a
to
cce733e
Compare
@bors try @rust-timer queue |
Insufficient permissions to issue commands to rust-timer. |
@timvermeulen: 🔑 Insufficient privileges: not in try users |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit cce733e6886c7bbac2e6d41add17a6645d4cf189 with merge ca6848a65861be4083fc65253d34ecf5f794338f... |
It makes more sense for |
☀️ Try build successful - checks-actions |
Queued ca6848a65861be4083fc65253d34ecf5f794338f with parent c9e134e, future comparison URL. |
Finished benchmarking commit (ca6848a65861be4083fc65253d34ecf5f794338f): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
I'm happy to have this land as is. |
cce733e
to
38bb0b1
Compare
@bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (6c943ba): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results
CyclesThis benchmark run did not return any relevant results for this metric. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
The
Flatten
andFlatMap
iterators both delegate toFlattenCompat
:Every individual iterator method that
FlattenCompat
implements needs to carefully manage this state, checking whether thefrontiter
andbackiter
are present, and storing the current iterator appropriately if iteration is aborted. This has led to methods such asnext
,advance_by
, andtry_fold
all having similar code for managing the iterator's state.I have extracted this common logic of iterating the inner iterators with the option to exit early into a
iter_try_fold
method:It passes each of the inner iterators to the given function as long as it keep succeeding. It takes care of managing
FlattenCompat
's state, so that the actualIterator
methods don't need to. The resulting code that makes use of this abstraction is much more straightforward:Note that despite being implemented in terms of
iter_try_fold
,next
is still able to benefit fromU
'snext
method. It therefore does not take the performance hit that implementingnext
directly in terms ofSelf::try_fold
causes (in some benchmarks).This PR also adds
iter_try_rfold
which captures the shared logic oftry_rfold
andadvance_back_by
, as well asiter_fold
anditer_rfold
for folding without early exits (used byfold
,rfold
,count
, andlast
).Benchmark results:
I added the last two benchmarks specifically to demonstrate an extreme case where
FlatMap::next
can benefit from custom internal iteration of the outer iterator, so take it with a grain of salt. We should probably do a perf run to see if the changes tonext
are worth it in practice.