Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional mutation in Iterator::next doesn't optimise well #24660

Closed
huonw opened this issue Apr 21, 2015 · 12 comments
Closed

Conditional mutation in Iterator::next doesn't optimise well #24660

huonw opened this issue Apr 21, 2015 · 12 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@huonw
Copy link
Member

huonw commented Apr 21, 2015

#![crate_type = "lib"]

struct R {
    start: i32,
    end: i32
}

impl Iterator for R
{
    type Item = i32;

    fn next(&mut self) -> Option<i32> {
        if self.start < self.end {
            let old = self.start;
            self.start += 1;
            Some(old)
        } else {
            None
        }
    }
}


pub fn test() -> usize {
    R { start: 0, end: 100000}.count()
}

Compiles to:

_ZN4test20h08046f2c32b9b146kaaE:
    .cfi_startproc
    xorl    %ecx, %ecx
    movq    $-1, %rax
    .align  16, 0x90
.LBB0_1:
    cmpl    $100000, %ecx
    setl    %dl
    movzbl  %dl, %edx
    addl    %ecx, %edx
    incq    %rax
    cmpl    $100000, %ecx
    movl    %edx, %ecx
    jl  .LBB0_1
    retq

It should at least look something like:

    xorl    %eax, %eax
    movl    $100000, %ecx
.LBB0_1: 
    incl    %eax
    decl    %ecx
    jne .LBB0_1

But really that case should be constant folded (clang folds the equivalent C for loop).

(NB. count is currently implemented as self.fold(0, |c, _| c + 1), and the bad codegen occurs if that is definition is used directly, and also if one uses the mutating for loop: let mut c = 0; for _ in ... { c += 1 }.)

This previously affected std::ops::Range (i.e. x..y), but #24705 implemented a work-around.

@huonw huonw added I-slow Issue: Problems and improvements with respect to performance of generated code. A-iterators labels Apr 21, 2015
@huonw
Copy link
Member Author

huonw commented Apr 21, 2015

Oh, this is possible due to our handling of "small" aggregates, e.g. Option<i32> (as used in the code above) is packed into a single i64, and LLVM seemingly doesn't like that. Changing 0 to 0i64 makes the code fold properly.

The implementation of Range::next is

        if self.start < self.end {
            let mut n = &self.start + &A::one();
            mem::swap(&mut n, &mut self.start);
            Some(n)
        } else {
            None
        }

which appears in the given code: the first cmpl is the first line, the setl/movzbl are conditionally adding one to self.start forcibly making this non-conditional by moving the contents out of the if also folds correctly, i.e. the compiler is being too smart for its own good and trying to avoid a branch overly eagerly.

        let mut n = &self.start + &A::one();
        mem::swap(&mut n, &mut self.start);
        if self.start < self.end {
            Some(n)
        } else {
            None
        }

Unfortunately, this transformation will mean that calling next yielding None enough times will cause the iterator to restart at 0 (if debug assertions are off).

cc @dotdash

@Thiez
Copy link
Contributor

Thiez commented Apr 22, 2015

The rust iterator trait states:

The Iterator protocol states that an iterator yields a (potentially-empty, potentially-infinite) sequence of values, and returns None to signal that it's finished. The Iterator protocol does not define behavior after None is returned. A concrete Iterator implementation may choose to behave however it wishes, either by returning None infinitely, or by doing something else.

So I don't think that is unfortunate at all.

@dotdash
Copy link
Contributor

dotdash commented Apr 22, 2015

FWIW, changing next() in the generated IR to return { i32, i32 } instead of i64 (like we did in the past) does not allow the optimization to happen with the current pass setup. Only after another run of opt -O2 we get the optimal result. So given that the iterator protocol allows for the implementation that optimizes well, I think we should use that. You want to compare against n in the if condition though, otherwise you get an off-by-one error.

@huonw
Copy link
Member Author

huonw commented Apr 22, 2015

So I don't think that is unfortunate at all.

I think the std iterators should endevour to be as well-behaved as possible, especially something as fundamental as Range.

@reem
Copy link
Contributor

reem commented Apr 22, 2015

@huonw users can always use fuse to get back the old behavior if it is important to their use case. In this case, I think it's more important to be fast, specifically because this is something as fundamental as Range.

@huonw
Copy link
Member Author

huonw commented Apr 22, 2015

Yes, I understand that, but this is the sort of thing which subtly results in broken programs (similar to overflow assertions): most situations won't think to call .fuse().

@reem
Copy link
Contributor

reem commented Apr 22, 2015

You're right, it's a tradeoff, I just think that the vast majority of iterators are used in for loops or other standard constructs which will just quit after the first None. In my experience very few uses of iterators ever call next after None has been returned once.

@huonw huonw changed the title Range iterator doesn't optimise well Conditional mutation in Iterator::next doesn't optimise well Apr 22, 2015
huonw added a commit to huonw/rust that referenced this issue Apr 22, 2015
The conditional mutation of the previous implementation resulted in poor
code, making it unconditional makes `Range` less well behaved as an
Iterator (but still legal) but also makes it fast.

The intention is that this change will be reverted when rustc/LLVM
handle the best-behaved implementation better.

cc rust-lang#24660
@huonw
Copy link
Member Author

huonw commented Apr 22, 2015

I've submitted #24705 with the intention that it's a temporary work-around.

@Thiez
Copy link
Contributor

Thiez commented Apr 22, 2015

If you really want well behaved iterators in the standard library I would suggest having them panic!() when next() is called after None. This way people will not start to rely on None being returned forever, which is exactly as it should be, according to the protocol.

@bluss
Copy link
Member

bluss commented Apr 22, 2015

You need to differentiate the (non-)guarantees of the trait and of specific implementations. I (and probably others) happily rely on the sanity of the standard slice iterator implementation, because I know I can.

oli-obk pushed a commit to oli-obk/rust that referenced this issue Jun 18, 2015
The conditional mutation of the previous implementation resulted in poor
code, making it unconditional makes `Range` less well behaved as an
Iterator (but still legal) but also makes it fast.

The intention is that this change will be reverted when rustc/LLVM
handle the best-behaved implementation better.

cc rust-lang#24660
@dotdash
Copy link
Contributor

dotdash commented Jun 18, 2015

This optimizes properly now. Maybe due to the LLVM update?

@dotdash dotdash closed this as completed Jun 18, 2015
@dotdash
Copy link
Contributor

dotdash commented Jun 18, 2015

For completeness sake:

    .section    .text._ZN4test20hb4b7c0c6e08ff614VaaE,"ax",@progbits
    .globl  _ZN4test20hb4b7c0c6e08ff614VaaE
    .align  16, 0x90
    .type   _ZN4test20hb4b7c0c6e08ff614VaaE,@function
_ZN4test20hb4b7c0c6e08ff614VaaE:
    .cfi_startproc
    movl    $100000, %eax
    retq
.Lfunc_end1:
    .size   _ZN4test20hb4b7c0c6e08ff614VaaE, .Lfunc_end1-_ZN4test20hb4b7c0c6e08ff614VaaE
    .cfi_endproc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants