Refactor iteration logic in the `Flatten` and `FlatMap` iterators #99541

timvermeulen · 2022-07-21T01:39:26Z

The Flatten and FlatMap iterators both delegate to FlattenCompat:

struct FlattenCompat<I, U> {
    iter: Fuse<I>,
    frontiter: Option<U>,
    backiter: Option<U>,
}

Every individual iterator method that FlattenCompat implements needs to carefully manage this state, checking whether the frontiter and backiter are present, and storing the current iterator appropriately if iteration is aborted. This has led to methods such as next, advance_by, and try_fold all having similar code for managing the iterator's state.

I have extracted this common logic of iterating the inner iterators with the option to exit early into a iter_try_fold method:

impl<I, U> FlattenCompat<I, U>
where
    I: Iterator<Item: IntoIterator<IntoIter = U>>,
{
    fn iter_try_fold<Acc, Fold, R>(&mut self, acc: Acc, fold: Fold) -> R
    where
        Fold: FnMut(Acc, &mut U) -> R,
        R: Try<Output = Acc>,
    { ... }
}

It passes each of the inner iterators to the given function as long as it keep succeeding. It takes care of managing FlattenCompat's state, so that the actual Iterator methods don't need to. The resulting code that makes use of this abstraction is much more straightforward:

fn next(&mut self) -> Option<U::Item> {
    #[inline]
    fn next<U: Iterator>((): (), iter: &mut U) -> ControlFlow<U::Item> {
        match iter.next() {
            None => ControlFlow::CONTINUE,
            Some(x) => ControlFlow::Break(x),
        }
    }

    self.iter_try_fold((), next).break_value()
}

Note that despite being implemented in terms of iter_try_fold, next is still able to benefit from U's next method. It therefore does not take the performance hit that implementing next directly in terms of Self::try_fold causes (in some benchmarks).

This PR also adds iter_try_rfold which captures the shared logic of try_rfold and advance_back_by, as well as iter_fold and iter_rfold for folding without early exits (used by fold, rfold, count, and last).

Benchmark results:

                                             before                after
bench_flat_map_sum                       423,255 ns/iter      414,338 ns/iter
bench_flat_map_ref_sum                 1,942,139 ns/iter    2,216,643 ns/iter
bench_flat_map_chain_sum               1,616,840 ns/iter    1,246,445 ns/iter
bench_flat_map_chain_ref_sum           4,348,110 ns/iter    3,574,775 ns/iter
bench_flat_map_chain_option_sum          780,037 ns/iter      780,679 ns/iter
bench_flat_map_chain_option_ref_sum    2,056,458 ns/iter      834,932 ns/iter

I added the last two benchmarks specifically to demonstrate an extreme case where FlatMap::next can benefit from custom internal iteration of the outer iterator, so take it with a grain of salt. We should probably do a perf run to see if the changes to next are worth it in practice.

rust-highfive · 2022-07-21T01:39:29Z

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

rustbot · 2022-07-21T01:39:29Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

the8472 · 2022-07-23T10:21:49Z

r? @the8472

the8472

The general DRYing seems fine, but do the last() and count() impls provide any benefit? The default impls already use fold.

bench_flat_map_ref_sum 1,942,139 ns/iter 2,216,643 ns/iter

This seems to be a slight regression. Since by-ref iterators don't forward internal iteration methods and rely on next() instead does that mean next() is now a bit less efficient?
Implementing next by using folding over an inner iterator is unusual and potentially means more codegen as the try_fold on adapters have a deeper callgraph compared to next implementations.
Maybe it'd help in cases where the outer iterator has to walk over tons of empty inner elements before it arrives at a non-empty one, but seems like an uncommon scenario when stepping with next() [Edit: I see that's what bench_flat_map_chain_option_ref_sum demonstrates]

Anyway, let's do a perf run since the compiler does use flatten and flat_map quite a bit

library/core/src/iter/adapters/flatten.rs

the8472 · 2022-07-23T11:07:52Z

@bors try @rust-timer queue

rust-timer · 2022-07-23T11:07:53Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-23T11:08:03Z

⌛ Trying commit f269998e5d0b44dd9fd78d4573c4386f7d07ad93 with merge 06aa8c36743913f0893953065baf3bb61c5f0d29...

bors · 2022-07-23T13:02:10Z

☀️ Try build successful - checks-actions
Build commit: 06aa8c36743913f0893953065baf3bb61c5f0d29 (06aa8c36743913f0893953065baf3bb61c5f0d29)

rust-timer · 2022-07-23T13:02:12Z

Queued 06aa8c36743913f0893953065baf3bb61c5f0d29 with parent 47ba935, future comparison URL.

timvermeulen · 2022-07-23T13:04:22Z

The general DRYing seems fine, but do the last() and count() impls provide any benefit? The default impls already use fold.

The default impls use Self::fold, which in turn uses I::fold and U::fold. This custom last can benefit from a custom last implementation on U. Calling flatten().last() on something like a Once<Chunks<'_, T>> demonstrates this well, since the optimizer seems unable to figure out what the last chunk is based on the default implementation of Iterator::last alone.

bench_flat_map_ref_sum 1,942,139 ns/iter 2,216,643 ns/iter

This seems to be a slight regression. Since by-ref iterators don't forward internal iteration methods and rely on next() instead does that mean next() is now a bit less efficient?

In this particular benchmark, yes. Not in the other two by-ref benchmarks. It's entirely possible that this one is more representative of real world code though.

Implementing next by using folding over an inner iterator is unusual and potentially means more codegen as the try_fold on adapters have a deeper callgraph compared to next implementations.

This is primarily true when only a single call to the inner next is needed. An iterator like FilterMap does (indirectly) call try_fold on the inner iterator from its next implementation. You're right about this only being beneficial when FlattenCompat::next has to skip over empty Us and therefore advance I by more than one.

library/core/src/iter/adapters/flatten.rs

rust-timer · 2022-07-23T14:17:13Z

Finished benchmarking commit (06aa8c36743913f0893953065baf3bb61c5f0d29): comparison url.

Instruction count

Primary benchmarks: mixed results
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	1.0%	1.9%	11
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.5%	-0.7%	8
Improvements 🎉 (secondary)	-0.5%	-1.0%	8
All 😿🎉 (primary)	0.4%	1.9%	19

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.6%	3.1%	5
Improvements 🎉 (primary)	-4.8%	-7.0%	3
Improvements 🎉 (secondary)	-2.0%	-2.0%	1
All 😿🎉 (primary)	-4.8%	-7.0%	3

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: no relevant changes found

	mean¹	max	count²
Regressions 😿 (primary)	2.6%	2.8%	2
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-2.4%	-2.4%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.9%	2.8%	3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

the8472 · 2022-07-26T17:16:39Z

So the perf numbers seem more negative than positive and it spends more time in codegen (even in debug) and in llvm opts. And the artifact sizes are larger too. I think it'd make sense to look at the llvm-lines output or the generated assembly.

timvermeulen · 2022-08-01T13:33:14Z

I think there's not much we can do here to improve this. If these perf results aren't deemed worth it for speeding up external iteration of flattened iterators where many of them are empty, then I think we should revert the changes to FlattenCompat::next.

the8472 · 2022-08-01T18:56:19Z

Sure, we can measure just the fold()-changes without the next() ones.

If these perf results aren't deemed worth it for speeding up external iteration of flattened iterators where many of them are empty

The microbenchmarks look worse for dense outer iterators, the perf results look worse on several metrics so I think we'd at least need some evidence that these kinds of iterators are common and that the speedups are worth it. And I think that is exactly the kind of scenario where advance_by(0) as a kind of optimization hint would be useful.

timvermeulen · 2022-08-01T20:00:45Z

@bors try @rust-timer queue

rust-timer · 2022-08-01T20:00:46Z

Insufficient permissions to issue commands to rust-timer.

bors · 2022-08-01T20:00:47Z

@timvermeulen: 🔑 Insufficient privileges: not in try users

the8472 · 2022-08-01T20:09:22Z

@bors try @rust-timer queue

rust-timer · 2022-08-01T20:09:24Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-08-01T20:09:33Z

⌛ Trying commit cce733e6886c7bbac2e6d41add17a6645d4cf189 with merge ca6848a65861be4083fc65253d34ecf5f794338f...

timvermeulen · 2022-08-01T20:29:26Z

And I think that is exactly the kind of scenario where advance_by(0) as a kind of optimization hint would be useful.

It makes more sense for Skip/Take than for Flatten. A flattened iterator can have empty iterators throughout, not just at the start/end. For this reason, this kind of optimization makes more sense inside next (if it were worthwhile).

bors · 2022-08-01T21:37:36Z

☀️ Try build successful - checks-actions
Build commit: ca6848a65861be4083fc65253d34ecf5f794338f (ca6848a65861be4083fc65253d34ecf5f794338f)

rust-timer · 2022-08-01T21:37:38Z

Queued ca6848a65861be4083fc65253d34ecf5f794338f with parent c9e134e, future comparison URL.

rust-timer · 2022-08-02T01:10:28Z

Finished benchmarking commit (ca6848a65861be4083fc65253d34ecf5f794338f): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvement found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-2.2%	-2.7%	3
Improvements 🎉 (secondary)	-2.0%	-2.0%	1
All 😿🎉 (primary)	-2.2%	-2.7%	3

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regression found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.4%	2.4%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

timvermeulen · 2022-08-02T07:18:22Z

I'm happy to have this land as is.

library/core/src/iter/adapters/flatten.rs

…_rfold`

…last`

the8472 · 2022-08-17T22:36:37Z

@bors r+ rollup=never

bors · 2022-08-17T22:36:39Z

📌 Commit 38bb0b1 has been approved by the8472

It is now in the queue for this repository.

bors · 2022-08-19T02:34:34Z

⌛ Testing commit 38bb0b1 with merge 6c943ba...

bors · 2022-08-19T05:15:37Z

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 6c943ba to master...

rust-timer · 2022-08-19T06:32:18Z

Finished benchmarking commit (6c943ba): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Primary benchmarks: ❌ relevant regression found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions ❌ (primary)	2.1%	2.1%	1
Regressions ❌ (secondary)	1.6%	1.8%	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.4%	-2.9%	3
All ❌✅ (primary)	2.1%	2.1%	1

Cycles

This benchmark run did not return any relevant results for this metric.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩
number of relevant changes ↩

rust-highfive assigned m-ou-se Jul 21, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 21, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 21, 2022

the8472 assigned the8472 and unassigned m-ou-se Jul 23, 2022

the8472 reviewed Jul 23, 2022

View reviewed changes

library/core/src/iter/adapters/flatten.rs Outdated Show resolved Hide resolved

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 23, 2022

the8472 reviewed Jul 23, 2022

View reviewed changes

library/core/src/iter/adapters/flatten.rs Outdated Show resolved Hide resolved

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 23, 2022

timvermeulen force-pushed the flatten_cleanup branch from 871975a to cce733e Compare August 1, 2022 20:00

rustbot removed the perf-regression Performance regression. label Aug 2, 2022

the8472 requested changes Aug 4, 2022

View reviewed changes

library/core/src/iter/adapters/flatten.rs Show resolved Hide resolved

library/core/src/iter/adapters/flatten.rs Show resolved Hide resolved

timvermeulen added 4 commits August 5, 2022 03:43

Move shared logic of try_fold and advance_by into iter_try_fold

8ff8d05

Move shared logic of try_rfold and advance_back_by into `iter_try…

cbc5f62

…_rfold`

Move fold logic to iter_fold method and reuse it in count and `…

3f70049

…last`

Move rfold logic into iter_rfold

38bb0b1

timvermeulen force-pushed the flatten_cleanup branch from cce733e to 38bb0b1 Compare August 5, 2022 01:44

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 17, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 19, 2022

bors merged commit 6c943ba into rust-lang:master Aug 19, 2022

rustbot added this to the 1.65.0 milestone Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor iteration logic in the `Flatten` and `FlatMap` iterators #99541

Refactor iteration logic in the `Flatten` and `FlatMap` iterators #99541

timvermeulen commented Jul 21, 2022

rust-highfive commented Jul 21, 2022

rustbot commented Jul 21, 2022

the8472 commented Jul 23, 2022 •

edited

Loading

the8472 left a comment •

edited

Loading

the8472 commented Jul 23, 2022

rust-timer commented Jul 23, 2022

bors commented Jul 23, 2022

bors commented Jul 23, 2022

rust-timer commented Jul 23, 2022

timvermeulen commented Jul 23, 2022

rust-timer commented Jul 23, 2022

the8472 commented Jul 26, 2022

timvermeulen commented Aug 1, 2022

the8472 commented Aug 1, 2022

timvermeulen commented Aug 1, 2022

rust-timer commented Aug 1, 2022

bors commented Aug 1, 2022

the8472 commented Aug 1, 2022

rust-timer commented Aug 1, 2022

bors commented Aug 1, 2022

timvermeulen commented Aug 1, 2022

bors commented Aug 1, 2022

rust-timer commented Aug 1, 2022

rust-timer commented Aug 2, 2022

timvermeulen commented Aug 2, 2022

the8472 commented Aug 17, 2022

bors commented Aug 17, 2022

bors commented Aug 19, 2022

bors commented Aug 19, 2022

rust-timer commented Aug 19, 2022

Refactor iteration logic in the Flatten and FlatMap iterators #99541

Refactor iteration logic in the Flatten and FlatMap iterators #99541

Conversation

timvermeulen commented Jul 21, 2022

rust-highfive commented Jul 21, 2022

rustbot commented Jul 21, 2022

the8472 commented Jul 23, 2022 • edited Loading

the8472 left a comment • edited Loading

Choose a reason for hiding this comment

the8472 commented Jul 23, 2022

rust-timer commented Jul 23, 2022

bors commented Jul 23, 2022

bors commented Jul 23, 2022

rust-timer commented Jul 23, 2022

timvermeulen commented Jul 23, 2022

rust-timer commented Jul 23, 2022

Footnotes

the8472 commented Jul 26, 2022

timvermeulen commented Aug 1, 2022

the8472 commented Aug 1, 2022

timvermeulen commented Aug 1, 2022

rust-timer commented Aug 1, 2022

bors commented Aug 1, 2022

the8472 commented Aug 1, 2022

rust-timer commented Aug 1, 2022

bors commented Aug 1, 2022

timvermeulen commented Aug 1, 2022

bors commented Aug 1, 2022

rust-timer commented Aug 1, 2022

rust-timer commented Aug 2, 2022

Footnotes

timvermeulen commented Aug 2, 2022

the8472 commented Aug 17, 2022

bors commented Aug 17, 2022

bors commented Aug 19, 2022

bors commented Aug 19, 2022

rust-timer commented Aug 19, 2022

Footnotes

Refactor iteration logic in the `Flatten` and `FlatMap` iterators #99541

Refactor iteration logic in the `Flatten` and `FlatMap` iterators #99541

the8472 commented Jul 23, 2022 •

edited

Loading

the8472 left a comment •

edited

Loading