Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced performance when using question mark operator instead of try! #37939

Closed
birkenfeld opened this issue Nov 22, 2016 · 23 comments
Closed

Reduced performance when using question mark operator instead of try! #37939

birkenfeld opened this issue Nov 22, 2016 · 23 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. WG-llvm Working group: LLVM backend code generation

Comments

@birkenfeld
Copy link
Contributor

birkenfeld commented Nov 22, 2016

This was reported on the users forum , and I don't want it to get lost. Basically, replacing try! by ? resulted in ~20% performance loss in benchmarks:

ratel-rust/ratel-core#48 (comment)

I've reproduced, but not further investigated, these findings. Is that expected right now? It's not a good argument for adopting the question mark :)

@arthurprs
Copy link
Contributor

I didn't read the implementation until now, I though they expanded to the exact same thing as try!()

@sfackler
Copy link
Member

Might just need to stick a couple of #[inline]s on Carrier methods.

@StefanoD
Copy link

I'm curious why this issue doesn't even get a label like I-Slow.

@birkenfeld
Copy link
Contributor Author

birkenfeld commented Dec 3, 2016

@sfackler I tried this (#[inline] on the trait method definitions and the impl methods), but didn't seem to get any different bench numbers.

@dotdash dotdash added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 8, 2016
@ghost
Copy link

ghost commented Dec 8, 2016

I've found out why this is slow. Suppose you have an expression res? of type Result<U, V>.
In librustc/hir/lowering.rs it will be desugared to:

match Carrier::translate(res) {
    Ok(val) => val,
    Err(err) => return Carrier::from_error(From::from(err)),
}

The real culprit is Carrier::translate. It contains a match expression itself, which doesn't get optimized away, even though it's implemented for Result<U, V> as the identity function (in src/libcore/ops.rs):

impl<U, V> Carrier for Result<U, V> {
    type Success = U;
    type Error = V;

    fn from_success(u: U) -> Result<U, V> {
        Ok(u)
    }

    fn from_error(e: V) -> Result<U, V> {
        Err(e)
    }

    fn translate<T>(self) -> T
        where T: Carrier<Success=U, Error=V>
    {
        match self {
            Ok(u) => T::from_success(u),
            Err(e) => T::from_error(e),
        }
    }
}

In lowering.rs we have this piece of desugaring process:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        let sub_expr = self.lower_expr(sub_expr);

                        let path = &["ops", "Carrier", "translate"];
                        let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
                        P(self.expr_call(e.span, path, hir_vec![sub_expr]))
                    };

Suppose we remove translation and implement the piece simply like this:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        P(self.lower_expr(sub_expr))
                    };

If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.

I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity compiles to:

identity:
	.cfi_startproc
	xorl	%eax, %eax
	cmpq	$0, (%rsi)
	movq	8(%rsi), %rcx
	setne	%al
	movq	%rax, (%rdi)
	movq	%rcx, 8(%rdi)
	movq	%rdi, %rax
	retq

You can see there are unncesseary cmpq and setne instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.

I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?

@bluss
Copy link
Member

bluss commented Dec 11, 2016

@arielb1 I've implemented a test version of the "variant copy prop" by hacking into CopyPropagation. However @eddyb has unfortunately 😉 said that it's much better to make a more general version of that that can propagate copies of aggregates (somehow).

scottlamb added a commit to scottlamb/moonfire-nvr that referenced this issue Feb 27, 2017
I just switched a couple inner loop ?s back to try!(...) to work around
rust-lang/rust#37939
frewsxcv added a commit to frewsxcv/rust that referenced this issue Jun 1, 2017
Lower `?` to `Try` instead of `Carrier`

The easy parts of rust-lang/rfcs#1859, whose FCP completed without further comments.

Just the trait and the lowering -- neither the error message improvements nor the insta-stable impl for Option nor exhaustive docs.

Based on a [github search](https://github.com/search?l=rust&p=1&q=question_mark_carrier&type=Code&utf8=%E2%9C%93), this will break the following:

- https://github.com/pfpacket/rust-9p/blob/00206e34c680198a0ac7c2f066cc2954187d4fac/src/serialize.rs#L38
- https://github.com/peterdelevoryas/bufparse/blob/b1325898f4fc2c67658049196c12da82548af350/src/result.rs#L50

The other results appear to be files from libcore or its tests.  I could also leave Carrier around after stage0 and `impl<T:Carrier> Try for T` if that would be better.

r? @nikomatsakis

Edit: Oh, and it might accidentally improve perf, based on rust-lang#37939 (comment), since `Try::into_result` for `Result` is an obvious no-op, unlike `Carrier::translate`.
frewsxcv added a commit to frewsxcv/rust that referenced this issue Jun 1, 2017
Lower `?` to `Try` instead of `Carrier`

The easy parts of rust-lang/rfcs#1859, whose FCP completed without further comments.

Just the trait and the lowering -- neither the error message improvements nor the insta-stable impl for Option nor exhaustive docs.

Based on a [github search](https://github.com/search?l=rust&p=1&q=question_mark_carrier&type=Code&utf8=%E2%9C%93), this will break the following:

- https://github.com/pfpacket/rust-9p/blob/00206e34c680198a0ac7c2f066cc2954187d4fac/src/serialize.rs#L38
- https://github.com/peterdelevoryas/bufparse/blob/b1325898f4fc2c67658049196c12da82548af350/src/result.rs#L50

The other results appear to be files from libcore or its tests.  I could also leave Carrier around after stage0 and `impl<T:Carrier> Try for T` if that would be better.

r? @nikomatsakis

Edit: Oh, and it might accidentally improve perf, based on rust-lang#37939 (comment), since `Try::into_result` for `Result` is an obvious no-op, unlike `Carrier::translate`.
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@kdy1
Copy link
Contributor

kdy1 commented Sep 28, 2017

I think this issue is fixed.

Generated asm is same for try! and ?.

https://play.rust-lang.org/?gist=625d88df305ace951a088e1cee2ec13a

Edit: Typo

@StefanoD
Copy link

Even if this is fixed: Is there a unit test which prevents regression?

@kennytm
Copy link
Member

kennytm commented Sep 28, 2017

We could have a codegen test for this.

Someone tag this as E-needstest please...

@bluss
Copy link
Member

bluss commented Sep 28, 2017

It looks like they both have the same bad code generation now, needs looking into.

@bluss
Copy link
Member

bluss commented Sep 28, 2017

Example where it is compared with the identity function. Returning Ok(x?) is the same as the identity for the result. playground link

But it seems this example is not a good condensation of the issue, because I can't find any previous Rust version (in rust.godbolt) that has the desired identity function code gen even for the try!() macro.

It doesn't show the difference between try and ?, but it shows something we can fix to improve them both.


Edited: Update to another example code link (configurable Result type).

@arielb1
Copy link
Contributor

arielb1 commented Sep 28, 2017

@bluss

I think https://reviews.llvm.org/D37216 should fix this, but it's a little stuck in the LLVM review queue.

@Mark-Simulacrum Mark-Simulacrum added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Sep 29, 2017
@mati865
Copy link
Contributor

mati865 commented Feb 1, 2018

@arielb1 it was reverted llvm-mirror/llvm@c87c1c0.

@kornelski
Copy link
Contributor

kornelski commented Apr 24, 2018

Still not fixed (tested nightly on playpen)

https://play.rust-lang.org/?version=nightly&mode=release&edition=2015&gist=b3858efebcadbffcbe2155e3f2750d07

@kennytm kennytm added the WG-llvm Working group: LLVM backend code generation label Apr 24, 2018
@mehcode
Copy link
Contributor

mehcode commented Aug 31, 2018

This seems to be fixed (in stable). Tried the play link above.

; Function Attrs: noinline norecurse nounwind readnone uwtable
define { i64, i64 } @try_op(i64, i64) unnamed_addr #2 {
  %3 = tail call { i64, i64 } @try_macro(i64 %0, i64 %1) #2
  ret { i64, i64 } %3
}

try_op:
	jmp	try_macro

try_macro:
	xorl	%eax, %eax
	testq	%rdi, %rdi
	setne	%al
	movq	%rsi, %rdx
	retq

@kennytm kennytm added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Aug 31, 2018
@bluss
Copy link
Member

bluss commented Dec 1, 2018

You can still provoke it to make a difference and introduce conditionals or more copies for the ? operator than for the try!() macro if you change the payload types for the Result, for example using (i32, i32) or String.

type T = (i32, i32);
type E = T;
type R = Result<T, E>;

#[no_mangle]
pub fn try_op(a: R) -> R {
    Ok(a?)
}

#[no_mangle]
pub fn try_macro(a: R) -> R {
    Ok(try!(a))
}

@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Dec 1, 2018
@mati865
Copy link
Contributor

mati865 commented Apr 3, 2019

Looks like it regressed: https://godbolt.org/z/awKD_U

@mati865
Copy link
Contributor

mati865 commented Aug 17, 2022

We are fast again (at least for the examples given above) since Rust 1.52 which contains LLVM 12 upgrade!

Once somebody adds test (or confirms that similar test already exists) this issue could be closed.

@Kobzol
Copy link
Contributor

Kobzol commented Aug 17, 2022

There's this codegen test, but I'm not sure if it's enough to check for this.

@StefanoD
Copy link

According to this blog post, Rust has to this date (Rust 1.62.1) performance problems with the question mark operator, resulting in a 4% performance loss.

@scottmcm
Copy link
Member

scottmcm commented Aug 17, 2022

but I'm not sure if it's enough to check for this.

Definitely not -- you'll notice it's both -Zunsound-mir-opts and FIXME: broken.

But there's good news! LLVM 15 merged a few days ago, bringing the fix mentioned in #85133 (comment).

With that, both of these are nops now:

#![feature(try_blocks)]

pub fn result_nop_match(x: Result<i32, u32>) -> Result<i32, u32> {
    match x {
        Ok(x) => Ok(x),
        Err(x) => Err(x),
    }
}

pub fn result_nop_traits(x: Result<i32, u32>) -> Result<i32, u32> {
    try {
        x?
    }
}

https://rust.godbolt.org/z/71dYnrMf6

example::result_nop_match:
        mov     rax, rdi
        ret

example::result_nop_traits:
        mov     rax, rdi
        ret

For comparison, on 1.63 even the match version needs a bunch of code: https://rust.godbolt.org/z/oarec37sG

example::result_nop_match:
        xor     ecx, ecx
        test    edi, edi
        setne   cl
        movabs  rax, -4294967296
        and     rax, rdi
        or      rax, rcx
        ret

EDIT: I opened #100693 to have a codegen test here.

@JohnTitor
Copy link
Member

JohnTitor commented Dec 21, 2022

Triage: #100693 added the test for this case (thank you @scottmcm!), closing as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests