Regression in generated code #16709

nalimilan · 2016-06-01T21:16:41Z

I've noticed an regression in generated code between 0.4.5 and a recent git master. I've been able to simplify the problem to this:

function f(x::Nullable, y::Nullable)
    Nullable(x.value + y.value)
end

@code_native f(Nullable(1), Nullable(2))

On 0.4.5 (system LLVM 3.3):

Source line: 2
    pushq   %rbp
    movq    %rsp, %rbp
Source line: 2
    movq    8(%rdx), %rax
Source line: 2
    addq    8(%rsi), %rax
    movq    %rax, 8(%rdi)
    movb    $0, (%rdi)
    movq    %rdi, %rax
    popq    %rbp
    ret

On master (in-tree LLVM 3.7.1):

Source line: 0
    pushq   %rbp
    movq    %rsp, %rbp
Source line: 2
    movq    8(%rsi), %rax
    addq    8(%rdx), %rax
    movb    $0, (%rdi)
    movl    -7(%rbp), %ecx
    movl    %ecx, 1(%rdi)
    movw    -3(%rbp), %cx
    movw    %cx, 5(%rdi)
    movb    -1(%rbp), %cl
    movb    %cl, 7(%rdi)
    movq    %rax, 8(%rdi)
    movq    %rdi, %rax
    popq    %rbp
    retq
    nopl    (%rax)

(Found when working on JuliaStats/NullableArrays.jl#111.)

carnaval · 2016-06-02T14:27:54Z

This is #16460. LLVM is following the orders here and is emitting extra code to copy the 7 useless padding bytes between the first byte and the integer payload.

In theory this is what http://llvm.org/docs/LangRef.html#tbaa-struct-metadata is meant for but I just tried locally and it doesn't seem to have an effect. Let me try a little more and if it does not work we can revert to plain load/stores for structures with a small enough number of fields.

carnaval · 2016-06-02T14:53:20Z

Ok it seems that in that case it's sufficient to turn on the memcpy opt pass. I did not realize this was not part of our pipeline. Could you try on your more complex example that it is also enough ?

diff --git a/src/jitlayers.cpp b/src/jitlayers.cpp
index 5f16160..d2dc83c 100644
--- a/src/jitlayers.cpp
+++ b/src/jitlayers.cpp
@@ -99,7 +99,7 @@ static void addOptimizationPasses(T *PM)
     PM->add(createInstructionCombiningPass()); // Clean up after the unroller
 #endif
     PM->add(createGVNPass());                  // Remove redundancies
-    //PM->add(createMemCpyOptPass());            // Remove memcpy / form memset
+    PM->add(createMemCpyOptPass());            // Remove memcpy / form memset
     PM->add(createSCCPPass());                 // Constant prop with SCCP

     // Run instcombine after redundancy elimination to exploit opportunities

nalimilan · 2016-06-02T20:19:10Z

Thanks! Unfortunately, this doesn't seem to make a real difference on the complex code. The benchmark I'm using isn't necessarily very representative of real workflows, but at least it should illustrate a possible one. See testf1, testf2 and testf3 at https://gist.github.com/nalimilan/94a3dc790bc592e8b2b561d9dcca9a9b

In particular, testf3 is twice slower than other methods on 0.5, while the penalty was less marked on 0.4.

vtjnash · 2016-06-13T05:15:38Z

I see no performance difference on current master between f1, f2, and f3

nalimilan · 2016-06-13T12:23:42Z

Indeed, now I get this:

julia> mean((@benchmark testf1(x, y)).times)
4.935441460784314e7

julia> mean((@benchmark testf2(x, y)).times)
5.076511718181818e7

julia> mean((@benchmark testf3(x, y)).times)
4.8260402625e7

So that's very similar to 0.4.5.

kshyatt added the compiler:codegen Generation of LLVM IR and native code label Jun 1, 2016

nalimilan added the regression Regression in behavior compared to a previous version label Jun 2, 2016

nalimilan mentioned this issue Jun 2, 2016

Arithmetic operators on Nullable JuliaStats/NullableArrays.jl#111

Closed

vtjnash closed this as completed in 757f06a Jun 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in generated code #16709

Regression in generated code #16709

nalimilan commented Jun 1, 2016

carnaval commented Jun 2, 2016

carnaval commented Jun 2, 2016 •

edited by yuyichao

Loading

nalimilan commented Jun 2, 2016

vtjnash commented Jun 13, 2016

nalimilan commented Jun 13, 2016

Regression in generated code #16709

Regression in generated code #16709

Comments

nalimilan commented Jun 1, 2016

carnaval commented Jun 2, 2016

carnaval commented Jun 2, 2016 • edited by yuyichao Loading

nalimilan commented Jun 2, 2016

vtjnash commented Jun 13, 2016

nalimilan commented Jun 13, 2016

carnaval commented Jun 2, 2016 •

edited by yuyichao

Loading