Conditional mutation in Iterator::next doesn't optimise well #24660

huonw · 2015-04-21T14:01:12Z

#![crate_type = "lib"]

struct R {
    start: i32,
    end: i32
}

impl Iterator for R
{
    type Item = i32;

    fn next(&mut self) -> Option<i32> {
        if self.start < self.end {
            let old = self.start;
            self.start += 1;
            Some(old)
        } else {
            None
        }
    }
}


pub fn test() -> usize {
    R { start: 0, end: 100000}.count()
}

Compiles to:

_ZN4test20h08046f2c32b9b146kaaE:
    .cfi_startproc
    xorl    %ecx, %ecx
    movq    $-1, %rax
    .align  16, 0x90
.LBB0_1:
    cmpl    $100000, %ecx
    setl    %dl
    movzbl  %dl, %edx
    addl    %ecx, %edx
    incq    %rax
    cmpl    $100000, %ecx
    movl    %edx, %ecx
    jl  .LBB0_1
    retq

It should at least look something like:

    xorl    %eax, %eax
    movl    $100000, %ecx
.LBB0_1: 
    incl    %eax
    decl    %ecx
    jne .LBB0_1

But really that case should be constant folded (clang folds the equivalent C for loop).

(NB. count is currently implemented as self.fold(0, |c, _| c + 1), and the bad codegen occurs if that is definition is used directly, and also if one uses the mutating for loop: let mut c = 0; for _ in ... { c += 1 }.)

This previously affected std::ops::Range (i.e. x..y), but #24705 implemented a work-around.

The text was updated successfully, but these errors were encountered:

huonw · 2015-04-21T22:40:46Z

Oh, this is possible due to our handling of "small" aggregates, e.g. Option<i32> (as used in the code above) is packed into a single i64, and LLVM seemingly doesn't like that. Changing 0 to 0i64 makes the code fold properly.

The implementation of Range::next is

        if self.start < self.end {
            let mut n = &self.start + &A::one();
            mem::swap(&mut n, &mut self.start);
            Some(n)
        } else {
            None
        }

which appears in the given code: the first cmpl is the first line, the setl/movzbl are conditionally adding one to self.start forcibly making this non-conditional by moving the contents out of the if also folds correctly, i.e. the compiler is being too smart for its own good and trying to avoid a branch overly eagerly.

        let mut n = &self.start + &A::one();
        mem::swap(&mut n, &mut self.start);
        if self.start < self.end {
            Some(n)
        } else {
            None
        }

Unfortunately, this transformation will mean that calling next yielding None enough times will cause the iterator to restart at 0 (if debug assertions are off).

cc @dotdash

Thiez · 2015-04-22T20:06:02Z

The rust iterator trait states:

The Iterator protocol states that an iterator yields a (potentially-empty, potentially-infinite) sequence of values, and returns None to signal that it's finished. The Iterator protocol does not define behavior after None is returned. A concrete Iterator implementation may choose to behave however it wishes, either by returning None infinitely, or by doing something else.

So I don't think that is unfortunate at all.

dotdash · 2015-04-22T20:15:29Z

FWIW, changing next() in the generated IR to return { i32, i32 } instead of i64 (like we did in the past) does not allow the optimization to happen with the current pass setup. Only after another run of opt -O2 we get the optimal result. So given that the iterator protocol allows for the implementation that optimizes well, I think we should use that. You want to compare against n in the if condition though, otherwise you get an off-by-one error.

huonw · 2015-04-22T21:17:54Z

So I don't think that is unfortunate at all.

I think the std iterators should endevour to be as well-behaved as possible, especially something as fundamental as Range.

reem · 2015-04-22T21:19:56Z

@huonw users can always use fuse to get back the old behavior if it is important to their use case. In this case, I think it's more important to be fast, specifically because this is something as fundamental as Range.

huonw · 2015-04-22T21:31:16Z

Yes, I understand that, but this is the sort of thing which subtly results in broken programs (similar to overflow assertions): most situations won't think to call .fuse().

reem · 2015-04-22T21:40:23Z

You're right, it's a tradeoff, I just think that the vast majority of iterators are used in for loops or other standard constructs which will just quit after the first None. In my experience very few uses of iterators ever call next after None has been returned once.

The conditional mutation of the previous implementation resulted in poor code, making it unconditional makes `Range` less well behaved as an Iterator (but still legal) but also makes it fast. The intention is that this change will be reverted when rustc/LLVM handle the best-behaved implementation better. cc rust-lang#24660

huonw · 2015-04-22T22:08:30Z

I've submitted #24705 with the intention that it's a temporary work-around.

Thiez · 2015-04-22T22:26:19Z

If you really want well behaved iterators in the standard library I would suggest having them panic!() when next() is called after None. This way people will not start to rely on None being returned forever, which is exactly as it should be, according to the protocol.

bluss · 2015-04-22T23:03:37Z

You need to differentiate the (non-)guarantees of the trait and of specific implementations. I (and probably others) happily rely on the sanity of the standard slice iterator implementation, because I know I can.

The conditional mutation of the previous implementation resulted in poor code, making it unconditional makes `Range` less well behaved as an Iterator (but still legal) but also makes it fast. The intention is that this change will be reverted when rustc/LLVM handle the best-behaved implementation better. cc rust-lang#24660

dotdash · 2015-06-18T22:16:01Z

This optimizes properly now. Maybe due to the LLVM update?

dotdash · 2015-06-18T22:17:10Z

For completeness sake:

    .section    .text._ZN4test20hb4b7c0c6e08ff614VaaE,"ax",@progbits
    .globl  _ZN4test20hb4b7c0c6e08ff614VaaE
    .align  16, 0x90
    .type   _ZN4test20hb4b7c0c6e08ff614VaaE,@function
_ZN4test20hb4b7c0c6e08ff614VaaE:
    .cfi_startproc
    movl    $100000, %eax
    retq
.Lfunc_end1:
    .size   _ZN4test20hb4b7c0c6e08ff614VaaE, .Lfunc_end1-_ZN4test20hb4b7c0c6e08ff614VaaE
    .cfi_endproc

huonw added I-slow Issue: Problems and improvements with respect to performance of generated code. A-iterators labels Apr 21, 2015

huonw changed the title ~~Range iterator doesn't optimise well~~ Conditional mutation in Iterator::next doesn't optimise well Apr 22, 2015

huonw mentioned this issue Apr 22, 2015

Work-around optimiser deficiencies in Range iterator. #24705

Closed

dotdash closed this as completed Jun 18, 2015

huonw mentioned this issue Jan 24, 2016

StepBy<_, Range<_>> optimises poorly #31155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional mutation in Iterator::next doesn't optimise well #24660

Conditional mutation in Iterator::next doesn't optimise well #24660

huonw commented Apr 21, 2015

huonw commented Apr 21, 2015

Thiez commented Apr 22, 2015

dotdash commented Apr 22, 2015

huonw commented Apr 22, 2015

reem commented Apr 22, 2015

huonw commented Apr 22, 2015

reem commented Apr 22, 2015

huonw commented Apr 22, 2015

Thiez commented Apr 22, 2015

bluss commented Apr 22, 2015

dotdash commented Jun 18, 2015

dotdash commented Jun 18, 2015

Conditional mutation in Iterator::next doesn't optimise well #24660

Conditional mutation in Iterator::next doesn't optimise well #24660

Comments

huonw commented Apr 21, 2015

huonw commented Apr 21, 2015

Thiez commented Apr 22, 2015

dotdash commented Apr 22, 2015

huonw commented Apr 22, 2015

reem commented Apr 22, 2015

huonw commented Apr 22, 2015

reem commented Apr 22, 2015

huonw commented Apr 22, 2015

Thiez commented Apr 22, 2015

bluss commented Apr 22, 2015

dotdash commented Jun 18, 2015

dotdash commented Jun 18, 2015