40x Performance Regression in v0.6 for simple polynomial function #23751

mbrookhart · 2017-09-18T03:55:41Z

Searching for a performance regression in a piece of code, simplified it to this:

function main()
    f1(x) = x + 2*x^2 + 4*x^4
    f2(x) = x + 2*x^2 + 4*x^2*x^2

    @code_llvm(f1(1))
    @code_llvm(f2(1))
    A=rand(10,10)
    f1.(A)
    f2.(A)
    A=rand(10000,10000)
    @time f1.(A)
    @time f2.(A)
end

main()

Fresh compile of tag v0.5.2:

define double @julia_f1_72267(double) #0 {
top:
  %1 = call double @llvm.powi.f64(double %0, i32 4)
  %2 = fmul double %0, %0
  %3 = fmul double %2, 2.000000e+00
  %4 = fadd double %3, %0
  %5 = fmul double %1, 4.000000e+00
  %6 = fadd double %4, %5
  ret double %6
}

define double @julia_f2_72271(double) #0 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, 2.000000e+00
  %3 = fadd double %2, %0
  %4 = fmul double %1, 4.000000e+00
  %5 = fmul double %1, %4
  %6 = fadd double %3, %5
  ret double %6
}
  0.118596 seconds (3 allocations: 762.940 MB, 24.19% gc time)
  0.120200 seconds (3 allocations: 762.940 MB, 24.94% gc time)

Fresh compile of tag v0.6.0:

define double @julia_f1_60711(double) #0 !dbg !5 {
top:
  %1 = call double @llvm.pow.f64(double %0, double 4.000000e+00)
  %2 = fadd double %0, 4.000000e+00
  %notlhs = fcmp ord double %1, 0.000000e+00
  %notrhs = fcmp uno double %2, 0.000000e+00
  %3 = or i1 %notrhs, %notlhs
  br i1 %3, label %L12, label %if

if:                                               ; preds = %top
  call void @jl_throw(i8** inttoptr (i64 140123921213312 to i8**))
  unreachable

L12:                                              ; preds = %top
  %4 = fmul double %0, %0
  %5 = fmul double %4, 2.000000e+00
  %6 = fadd double %5, %0
  %7 = fmul double %1, 4.000000e+00
  %8 = fadd double %6, %7
  ret double %8
}

define double @julia_f2_60732(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, %0
  %2 = fmul double %1, 2.000000e+00
  %3 = fadd double %2, %0
  %4 = fmul double %1, 4.000000e+00
  %5 = fmul double %1, %4
  %6 = fadd double %3, %5
  ret double %6
}
  5.082136 seconds (2 allocations: 762.940 MiB, 0.59% gc time)
  0.122529 seconds (2 allocations: 762.940 MiB, 26.56% gc time)

The text was updated successfully, but these errors were encountered:

yuyichao · 2017-09-18T04:06:11Z

This is intentional due to the old version producing inaccurate result. Ref #19872

mbrookhart · 2017-09-18T04:52:05Z

Ha, okay. I tried the same idea in C++ and I see a similar regression between pow(x,2)*pow(x,2) and pow(x,4), this must be one place where generic programming is biting me.

Thanks!

yuyichao closed this as completed Sep 18, 2017

KristofferC mentioned this issue Sep 21, 2017

Surprising performance for ^(::Float64, ::Int) vs ^(::Complex128, ::Int) #23804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

40x Performance Regression in v0.6 for simple polynomial function #23751

40x Performance Regression in v0.6 for simple polynomial function #23751

mbrookhart commented Sep 18, 2017

yuyichao commented Sep 18, 2017

mbrookhart commented Sep 18, 2017 •

edited

Loading

40x Performance Regression in v0.6 for simple polynomial function #23751

40x Performance Regression in v0.6 for simple polynomial function #23751

Comments

mbrookhart commented Sep 18, 2017

yuyichao commented Sep 18, 2017

mbrookhart commented Sep 18, 2017 • edited Loading

mbrookhart commented Sep 18, 2017 •

edited

Loading