-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in generated code #16709
Comments
This is #16460. LLVM is following the orders here and is emitting extra code to copy the 7 useless padding bytes between the first byte and the integer payload. In theory this is what http://llvm.org/docs/LangRef.html#tbaa-struct-metadata is meant for but I just tried locally and it doesn't seem to have an effect. Let me try a little more and if it does not work we can revert to plain load/stores for structures with a small enough number of fields. |
Ok it seems that in that case it's sufficient to turn on the memcpy opt pass. I did not realize this was not part of our pipeline. Could you try on your more complex example that it is also enough ? diff --git a/src/jitlayers.cpp b/src/jitlayers.cpp
index 5f16160..d2dc83c 100644
--- a/src/jitlayers.cpp
+++ b/src/jitlayers.cpp
@@ -99,7 +99,7 @@ static void addOptimizationPasses(T *PM)
PM->add(createInstructionCombiningPass()); // Clean up after the unroller
#endif
PM->add(createGVNPass()); // Remove redundancies
- //PM->add(createMemCpyOptPass()); // Remove memcpy / form memset
+ PM->add(createMemCpyOptPass()); // Remove memcpy / form memset
PM->add(createSCCPPass()); // Constant prop with SCCP
// Run instcombine after redundancy elimination to exploit opportunities |
Thanks! Unfortunately, this doesn't seem to make a real difference on the complex code. The benchmark I'm using isn't necessarily very representative of real workflows, but at least it should illustrate a possible one. See testf1, testf2 and testf3 at https://gist.github.com/nalimilan/94a3dc790bc592e8b2b561d9dcca9a9b In particular, testf3 is twice slower than other methods on 0.5, while the penalty was less marked on 0.4. |
I see no performance difference on current master between f1, f2, and f3 |
Indeed, now I get this: julia> mean((@benchmark testf1(x, y)).times)
4.935441460784314e7
julia> mean((@benchmark testf2(x, y)).times)
5.076511718181818e7
julia> mean((@benchmark testf3(x, y)).times)
4.8260402625e7 So that's very similar to 0.4.5. |
I've noticed an regression in generated code between 0.4.5 and a recent git master. I've been able to simplify the problem to this:
On 0.4.5 (system LLVM 3.3):
On master (in-tree LLVM 3.7.1):
(Found when working on JuliaStats/NullableArrays.jl#111.)
The text was updated successfully, but these errors were encountered: