Ironing out StepBy<Range>'s performance issues #51557

Emerentius · 2018-06-14T17:40:49Z

The behaviour of <Range<_> as Iterator>::nth has a slight mismatch with StepBy (or Step depending on your viewpoint) as @scottmcm has found out, resulting in sub-optimal performance.
On every iteration, the range has to first step forwards n-1 times to get the next element and then advance again by 1.

I'm hoping we can improve step_by into a 100% zero-cost abstraction.

It seems like the performance issue is specific to StepBy<Range>. I'm thinking therefore that we could specialize Iterator for StepBy<Range<I>> such that it would use @scottmcm's suggested semantics. Like this:

impl<I> Iterator for StepBy<Range<I>>
where
    I: Step
{
    fn next(&mut self) -> Option<Self::Item> {
        self.first = false;
        if let Some(mut n) = self.start.add_usize(self.step+1) {
            if n < self.end {
                std::mem::swap(&mut self.start, &mut n);
                return Some(n);
            }
        }

        self.start = self.end.clone();
        None
    }
}

That also avoids the branch on a regular next(). I haven't looked at the other methods but that boolean in StepBy could possibly become superfluous. During construction of the StepBy adapter, the size in .step_by(size) is decremented and this specialization has to counter-add 1 every time but that should be optimized away if inlined.

If someone were to depend on side-effects in Step::add_usize (when the trait is stabilized), this pre-stepping would become weird. Same thing with a hypothetical next_and_skip_ahead().

@scottmcm what do you think of this?

The text was updated successfully, but these errors were encountered:

scottmcm · 2018-06-15T00:52:07Z

Overall yes; someone should make a PR 🙂 Probably best to use specialization for now to hide the detail. (I suppose a doc-hidden unstable method on Iterator could work too, but that feels wrong.)

A few minor things:

~~that we could specialize Iterator for StepBy<Range<I>>~~

~~I thought that normally public traits didn't get specialized, instead delegating to an internal one and specializing that, but apparently Iterator for Fuse is specialized, so I'm misremembering.~~

but that boolean in StepBy could possibly become superfluous

It would still be needed for non-Ranges, of course, so I don't think it can go away in the specialization approach, since Fuse still has its boolean even when the specialization never uses it.

Like this:

It's hard to guess at exactly how it should be written without looking at how LLVM handles it. Something like that seems plausible, but I could imagine seemingly-unimportant things like the branch orderings affecting the loop passes. Maybe it would be possible to have a codegen test to make sure?

Emerentius · 2018-06-15T21:02:49Z

I've copied the std implementations onto godbolt and added the specialization. As expected, it improves the generated code a lot.
You're right, @scottmcm, with switched branch order it produces slightly different, seemingly better, code. With that and usize ranges it compiles to nothing, u8 still has extra code. That is with constant start, end and step size, I haven't checked the rest, yet.

code on godbolt

Adding this specializion slightly changes what we're expecting from the Step trait in that any implementor must be fine with having 1 more add_usize() call than strictly necessary (side effects). It's unstable now anyway but in the future it may matter.
I'm going to send a PR for the specialization.

Is there any way we can get rid of the TryFrom baggage?

scottmcm · 2018-06-15T21:32:18Z

I haven't looked at it in detail yet, but reminder that this isn't exactly what we want to reach:

// what we want to reach
pub fn manual_while() {
    let mut n = 0;
    while n < UPPER {
        test::black_box(n);
        n += STEP;
    }
}

Because that doesn't have overflow detection, and is thus an infinite loop for something like (0..255u8).step_by(2), as it'll get to 254 and wrap back to 0, which are both still below 255.

Emerentius · 2018-06-15T22:23:37Z

Good catch. That's also what caused the difference between manual and specialized for u8.

Emerentius · 2018-06-16T23:11:48Z

~~Looks like specialization is still too broken for this. It trips up type inference.~~ Fixed by second trait.

Another issue I've found is that with RangeFrom I can't specialize it generically in a way that inherits overflow checks based on debug or release. Furthermore, the overflow and therefore panic in debug mode would happen one iteration too early. We may still be able to specialize RangeFrom for each type separately whilst preserving semantics.

Step::add_usize for signed integers is currently impeding optimizations.

Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.

the8472 · 2023-05-20T15:24:56Z

For the record, the issue is still open due because the specialization PR (#51601) was reverted in #56049

the8472 · 2023-09-03T18:41:14Z

#111850 solved this for unsigned integers. Is that sufficient?

kennytm added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 14, 2018

Emerentius mentioned this issue Jun 16, 2018

Specialize StepBy<Range(Inclusive)> #51601

Merged

Emerentius mentioned this issue Jul 5, 2018

Allow Pre-Stepping in Step-By? #52065

Closed

the8472 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 20, 2023

the8472 mentioned this issue May 20, 2023

Specialization for StepBy<ops::Range<T>> #108615

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ironing out StepBy<Range>'s performance issues #51557

Ironing out StepBy<Range>'s performance issues #51557

Emerentius commented Jun 14, 2018

scottmcm commented Jun 15, 2018

Emerentius commented Jun 15, 2018

scottmcm commented Jun 15, 2018

Emerentius commented Jun 15, 2018

Emerentius commented Jun 16, 2018 •

edited

Loading

the8472 commented May 20, 2023

the8472 commented Sep 3, 2023

Ironing out StepBy<Range>'s performance issues #51557

Ironing out StepBy<Range>'s performance issues #51557

Comments

Emerentius commented Jun 14, 2018

scottmcm commented Jun 15, 2018

Emerentius commented Jun 15, 2018

scottmcm commented Jun 15, 2018

Emerentius commented Jun 15, 2018

Emerentius commented Jun 16, 2018 • edited Loading

the8472 commented May 20, 2023

the8472 commented Sep 3, 2023

Emerentius commented Jun 16, 2018 •

edited

Loading