Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate excessive null-checks from slice iterators #21886

Merged
merged 2 commits into from
Feb 19, 2015

Conversation

dotdash
Copy link
Contributor

@dotdash dotdash commented Feb 3, 2015

The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193

running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured

Fixes #18193

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@huonw
Copy link
Member

huonw commented Feb 3, 2015

From the description I take it this doesn't suffer from the compile time problem mentioned in #21418 (comment)? (Out of interest what changed in LLVM? Just general optimisations?)

Also, I found that the compiler aborted compiling libcore (can't remember if it was stage1 or stage2) when I did a similar change (#21448); this doesn't suffer from that?

@dotdash
Copy link
Contributor Author

dotdash commented Feb 3, 2015

From the description I take it this doesn't suffer from the compile time problem mentioned in #21418 (comment)?

Right, I compared compile times for (IIRC) rustc_trans and rustc_typeck and there was no measureable difference in compile times with the assume in stage 2 (or maybe it was stage 1 that I tested, shouldn't matter).

I also just bootstrapped a new version of #21418 and that took ~24 minutes from finishing stage0 libcore to finishing stage2 librustc. Considering that just rustc_trans took 13 minutes when the slowdown was there, it looks good to me ;-)

(Out of interest what changed in LLVM? Just general optimisations?)

I don't know.

Also, I found that the compiler aborted compiling libcore (can't remember if it was stage1 or stage2) when I did a similar change (#21448); this doesn't suffer from that?

The benchmark was taken with the stage2 rustc, so no, it didn't abort.

@huonw
Copy link
Member

huonw commented Feb 3, 2015

Could you add the assume to next_back as well?

@dotdash
Copy link
Contributor Author

dotdash commented Feb 3, 2015

Why next_back? Wouldn't as_mut_slice and into_iter be the places that correspond to as_slice?

I originally had it in next (which is why I forgot to add it to as_mut_slice, because next is generated in a macro that is used for both iter types), and that gave worse results, because the assumption got lost at some point when SROA (I think) replaced the load in next() with the one from as_slice.

@huonw
Copy link
Member

huonw commented Feb 3, 2015

Oh, huh. I totally misread this PR. I was thinking this was in <slice::Iter<T> as Iterator>::next.

@huonw
Copy link
Member

huonw commented Feb 3, 2015

I suspect that means that this doesn't solve a case like http://is.gd/pteFUM,

.LBB0_2:
    testq   %rdx, %rdx
    je  .LBB0_4
    addl    (%rdx), %eax
    addq    $4, %rdx
    addq    $-4, %rcx
    jne .LBB0_2

since there's no Vec involved. I wonder if that would be addressed by placing assumptions in the iter/iter_mut constructor?

@dotdash
Copy link
Contributor Author

dotdash commented Feb 3, 2015

Will try to see if that helps

@@ -427,8 +428,12 @@ impl<T> Vec<T> {
#[stable(feature = "rust1", since = "1.0.0")]
pub fn as_mut_slice<'a>(&'a mut self) -> &'a mut [T] {
unsafe {
let ptr = *self.ptr;
if cfg!(not(stage0)) { // NOTE remove cfg after next snapshot
assume(ptr != 0 as *mut T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr::null_mut() here and all other uses of 0 as *mut T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, using is_null() works now, that didn't work the first time I tried, either because of an error on my side or because of the ptrtoint instruction it was generated with the old is_null() implementation.
Thanks!

@dotdash
Copy link
Contributor Author

dotdash commented Feb 3, 2015

I wonder if that would be addressed by placing assumptions in the iter/iter_mut constructor?

At least in this case, the assume() gets canonicalized to !nonnull metadata on a load, and that gets lost in the SROA pass. I've added the call anyway, hoping for it to be useful in the future.

@huonw
Copy link
Member

huonw commented Feb 3, 2015

@bors r+ e49f

@nikomatsakis nikomatsakis assigned huonw and unassigned nikomatsakis Feb 3, 2015
@alexcrichton
Copy link
Member

@bors: rollup

@alexcrichton
Copy link
Member

@bors: rollup-

@bors
Copy link
Contributor

bors commented Feb 4, 2015

⌛ Testing commit e49f6d6 with merge cd0d956...

@bors
Copy link
Contributor

bors commented Feb 4, 2015

💔 Test failed - auto-win-32-nopt-t

@bluss
Copy link
Member

bluss commented Feb 5, 2015

@huonw, that fold testcase seems to have been resolved now, it vectorizes! (rustc 1.0.0-nightly (ba2f13ef0 2015-02-04 20:03:55 +0000) Probably due to the other awesome nonnull changes?

@huonw
Copy link
Member

huonw commented Feb 7, 2015

The failure seems legitimate; although, possibly a LLVM bug?

@dotdash
Copy link
Contributor Author

dotdash commented Feb 9, 2015

Yeah, I can reproduce that in my Windows VM.

@dotdash
Copy link
Contributor Author

dotdash commented Feb 10, 2015

Yup, LLVM bug: http://reviews.llvm.org/D7533

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 18, 2015
@nagisa
Copy link
Member

nagisa commented Feb 18, 2015

LLVM update landed. Needs rebase.

Casting the pointer to an integer requires a ptrtoint, while casting 0
to a pointer is directly folded to a `null` value.
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from rust-lang#18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes rust-lang#18193
@dotdash
Copy link
Contributor Author

dotdash commented Feb 18, 2015

@bors r=huonw 7412d1b

@bors
Copy link
Contributor

bors commented Feb 18, 2015

⌛ Testing commit 7412d1b with merge c103604...

@bors
Copy link
Contributor

bors commented Feb 18, 2015

💔 Test failed - auto-win-32-nopt-t

@alexcrichton
Copy link
Member

@bors: retry

@bors
Copy link
Contributor

bors commented Feb 18, 2015

⌛ Testing commit 7412d1b with merge a1cfc62...

bors added a commit that referenced this pull request Feb 18, 2015
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes #18193
@bors
Copy link
Contributor

bors commented Feb 18, 2015

💔 Test failed - auto-win-32-nopt-t

@aturon
Copy link
Member

aturon commented Feb 18, 2015

@bors: retry

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 19, 2015
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from rust-lang#18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes rust-lang#18193
@bors bors merged commit 7412d1b into rust-lang:master Feb 19, 2015
@dotdash dotdash deleted the fast_slice_iter branch February 22, 2015 13:26
dotdash added a commit to dotdash/rust that referenced this pull request Feb 22, 2015
This adds the assume() calls back that got lost when rebasing rust-lang#21886.
bors added a commit that referenced this pull request Feb 28, 2015
This adds the assume() calls back that got lost when rebasing #21886.
@frol
Copy link
Contributor

frol commented May 9, 2018

Could someone, please, point me to a tracking issue or a documentation about the excessive null-checks?

I want to learn more about the cause of extra testq %rsi, %rsi (test rsi, rsi) blocks when I compile the following snippets:

pub fn foo(s: &[u32]) -> u32 {
    s.iter().sum()
}
pub fn foo(s: &[u32]) -> u32 {
    s.iter().sum()
}
pub fn foo(s: Vec<i32>) -> usize {
    s.len()
}

and even

pub fn foo(_s: Vec<i32>) -> usize {
    0
}

produces assembly code which might have been avoided:

playground::foo:
	movq	8(%rdi), %rsi
	testq	%rsi, %rsi
	je	.LBB0_2
	pushq	%rax
	movq	(%rdi), %rdi
	shlq	$2, %rsi
	movl	$4, %edx
	callq	__rust_dealloc@PLT
	addq	$8, %rsp

.LBB0_2:
	xorl	%eax, %eax
	retq

@nagisa
Copy link
Member

nagisa commented May 9, 2018

In both these cases the length, rather than pointer is being checked. For the slice case as early exit before a vectorised solution is used, for Vec drop case because Vec may actually not have backing storage (Vec::new does not allocate).

@frol
Copy link
Contributor

frol commented May 9, 2018

for Vec drop case because Vec may actually not have backing storage (Vec::new does not allocate).

Indeed, I completely forgot that the owned object must be deallocated here.

For the slice case as early exit before a vectorised solution is used

Right...

@nagisa I am sorry for the trouble! Rust is awesome! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iterator methods produce slow code