Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in generated code #16709

Closed
nalimilan opened this issue Jun 1, 2016 · 5 comments
Closed

Regression in generated code #16709

nalimilan opened this issue Jun 1, 2016 · 5 comments
Labels
compiler:codegen Generation of LLVM IR and native code regression Regression in behavior compared to a previous version

Comments

@nalimilan
Copy link
Member

I've noticed an regression in generated code between 0.4.5 and a recent git master. I've been able to simplify the problem to this:

function f(x::Nullable, y::Nullable)
    Nullable(x.value + y.value)
end

@code_native f(Nullable(1), Nullable(2))

On 0.4.5 (system LLVM 3.3):

Source line: 2
    pushq   %rbp
    movq    %rsp, %rbp
Source line: 2
    movq    8(%rdx), %rax
Source line: 2
    addq    8(%rsi), %rax
    movq    %rax, 8(%rdi)
    movb    $0, (%rdi)
    movq    %rdi, %rax
    popq    %rbp
    ret

On master (in-tree LLVM 3.7.1):

Source line: 0
    pushq   %rbp
    movq    %rsp, %rbp
Source line: 2
    movq    8(%rsi), %rax
    addq    8(%rdx), %rax
    movb    $0, (%rdi)
    movl    -7(%rbp), %ecx
    movl    %ecx, 1(%rdi)
    movw    -3(%rbp), %cx
    movw    %cx, 5(%rdi)
    movb    -1(%rbp), %cl
    movb    %cl, 7(%rdi)
    movq    %rax, 8(%rdi)
    movq    %rdi, %rax
    popq    %rbp
    retq
    nopl    (%rax)

(Found when working on JuliaStats/NullableArrays.jl#111.)

@kshyatt kshyatt added the compiler:codegen Generation of LLVM IR and native code label Jun 1, 2016
@carnaval
Copy link
Contributor

carnaval commented Jun 2, 2016

This is #16460. LLVM is following the orders here and is emitting extra code to copy the 7 useless padding bytes between the first byte and the integer payload.

In theory this is what http://llvm.org/docs/LangRef.html#tbaa-struct-metadata is meant for but I just tried locally and it doesn't seem to have an effect. Let me try a little more and if it does not work we can revert to plain load/stores for structures with a small enough number of fields.

@carnaval
Copy link
Contributor

carnaval commented Jun 2, 2016

Ok it seems that in that case it's sufficient to turn on the memcpy opt pass. I did not realize this was not part of our pipeline. Could you try on your more complex example that it is also enough ?

diff --git a/src/jitlayers.cpp b/src/jitlayers.cpp
index 5f16160..d2dc83c 100644
--- a/src/jitlayers.cpp
+++ b/src/jitlayers.cpp
@@ -99,7 +99,7 @@ static void addOptimizationPasses(T *PM)
     PM->add(createInstructionCombiningPass()); // Clean up after the unroller
 #endif
     PM->add(createGVNPass());                  // Remove redundancies
-    //PM->add(createMemCpyOptPass());            // Remove memcpy / form memset
+    PM->add(createMemCpyOptPass());            // Remove memcpy / form memset
     PM->add(createSCCPPass());                 // Constant prop with SCCP

     // Run instcombine after redundancy elimination to exploit opportunities

@nalimilan
Copy link
Member Author

Thanks! Unfortunately, this doesn't seem to make a real difference on the complex code. The benchmark I'm using isn't necessarily very representative of real workflows, but at least it should illustrate a possible one. See testf1, testf2 and testf3 at https://gist.github.com/nalimilan/94a3dc790bc592e8b2b561d9dcca9a9b

In particular, testf3 is twice slower than other methods on 0.5, while the penalty was less marked on 0.4.

@nalimilan nalimilan added the regression Regression in behavior compared to a previous version label Jun 2, 2016
@vtjnash
Copy link
Member

vtjnash commented Jun 13, 2016

I see no performance difference on current master between f1, f2, and f3

@nalimilan
Copy link
Member Author

Indeed, now I get this:

julia> mean((@benchmark testf1(x, y)).times)
4.935441460784314e7

julia> mean((@benchmark testf2(x, y)).times)
5.076511718181818e7

julia> mean((@benchmark testf3(x, y)).times)
4.8260402625e7

So that's very similar to 0.4.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

4 participants