Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

40x Performance Regression in v0.6 for simple polynomial function #23751

Closed
mbrookhart opened this issue Sep 18, 2017 · 2 comments
Closed

40x Performance Regression in v0.6 for simple polynomial function #23751

mbrookhart opened this issue Sep 18, 2017 · 2 comments

Comments

@mbrookhart
Copy link

Searching for a performance regression in a piece of code, simplified it to this:

function main()
    f1(x) = x + 2*x^2 + 4*x^4
    f2(x) = x + 2*x^2 + 4*x^2*x^2

    @code_llvm(f1(1))
    @code_llvm(f2(1))
    A=rand(10,10)
    f1.(A)
    f2.(A)
    A=rand(10000,10000)
    @time f1.(A)
    @time f2.(A)
end

main()

Fresh compile of tag v0.5.2:

define double @julia_f1_72267(double) #0 {
top:
  %1 = call double @llvm.powi.f64(double %0, i32 4)
  %2 = fmul double %0, %0
  %3 = fmul double %2, 2.000000e+00
  %4 = fadd double %3, %0
  %5 = fmul double %1, 4.000000e+00
  %6 = fadd double %4, %5
  ret double %6
}

define double @julia_f2_72271(double) #0 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, 2.000000e+00
  %3 = fadd double %2, %0
  %4 = fmul double %1, 4.000000e+00
  %5 = fmul double %1, %4
  %6 = fadd double %3, %5
  ret double %6
}
  0.118596 seconds (3 allocations: 762.940 MB, 24.19% gc time)
  0.120200 seconds (3 allocations: 762.940 MB, 24.94% gc time)

Fresh compile of tag v0.6.0:

define double @julia_f1_60711(double) #0 !dbg !5 {
top:
  %1 = call double @llvm.pow.f64(double %0, double 4.000000e+00)
  %2 = fadd double %0, 4.000000e+00
  %notlhs = fcmp ord double %1, 0.000000e+00
  %notrhs = fcmp uno double %2, 0.000000e+00
  %3 = or i1 %notrhs, %notlhs
  br i1 %3, label %L12, label %if

if:                                               ; preds = %top
  call void @jl_throw(i8** inttoptr (i64 140123921213312 to i8**))
  unreachable

L12:                                              ; preds = %top
  %4 = fmul double %0, %0
  %5 = fmul double %4, 2.000000e+00
  %6 = fadd double %5, %0
  %7 = fmul double %1, 4.000000e+00
  %8 = fadd double %6, %7
  ret double %8
}

define double @julia_f2_60732(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, 2.000000e+00
  %3 = fadd double %2, %0
  %4 = fmul double %1, 4.000000e+00
  %5 = fmul double %1, %4
  %6 = fadd double %3, %5
  ret double %6
}
  5.082136 seconds (2 allocations: 762.940 MiB, 0.59% gc time)
  0.122529 seconds (2 allocations: 762.940 MiB, 26.56% gc time)
@yuyichao
Copy link
Contributor

This is intentional due to the old version producing inaccurate result. Ref #19872

@mbrookhart
Copy link
Author

mbrookhart commented Sep 18, 2017

Ha, okay. I tried the same idea in C++ and I see a similar regression between pow(x,2)*pow(x,2) and pow(x,4), this must be one place where generic programming is biting me.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants