-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add ThinLTO for LLVM >= 4.0 #4367
Conversation
src/llvm/ext/llvm_ext.cc
Outdated
@@ -6,6 +6,10 @@ | |||
#include <llvm/IR/DebugLoc.h> | |||
#include <llvm/IR/LLVMContext.h> | |||
#include <llvm/IR/Metadata.h> | |||
#include "llvm/Analysis/ModuleSummaryAnalysis.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't added #include
paths be enclosed in <...>
instead of "..."
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same. And it is already inconsistent some lines above 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not the same, since #include <...>
form uses system header files, while #include "..."
starts from current directory. See http://stackoverflow.com/questions/21593/what-is-the-difference-between-include-filename-and-include-filename for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same (in this context). I need to add some ifdef too for older llvm, I will change it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive results!
src/compiler/crystal/compiler.cr
Outdated
if thin_lto | ||
lto_cache_dir = "#{output_dir}/lto.cache" | ||
Dir.mkdir_p(lto_cache_dir) | ||
cc = ENV["CC"]? || "clang -flto=thin -Wl,-mllvm,-threads=#{n_threads},-cache_path_lto,#{lto_cache_dir},#{@release ? "-mllvm,-O2" : "-mllvm,-O0"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If set is set by the system or to clang-4.0
for example, then thinlto will be silently ignored.
Is it possible to detect that CC is clang >= 4.0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the LLD homepage, it says that lld is embeddable, perhaps it would be better to just link against lld and bundle it with crystal since it's probably rather rare in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embedding lld might be a good way to go in the long term. I wanted to change as little as possible.
@ysbaddaden yes, I wanted to be able to override the linker as you can do using CC. Since the flags are sensible to the linker to be used it basically boils down to the user to set the rest of the flags as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but we can specify link flags with --link-flags
, we already force flags and libraries on CC
anyway, and if we're going this way, there is --cross-compile
to only generate objects.
The thing is that Debian and Freebsd systems (and possibly others) don't provide a generic clang
binary or link, only clang-3.9
, clang-4.0
and so on, so I can't use thin lto as the following call, or when I simply export CC
in my ~/.profile
, which would result in non optimized binaries (many modules, no link-time optimizations):
$ CC=clang-4.0 shards build --release --lto=thin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with ThinLTO the .bc stored in .o files with augmented metadata. This is different than --cross-compile
.
src/compiler/crystal/command.cr
Outdated
@@ -292,6 +292,12 @@ class Crystal::Command | |||
opts.on("", "--no-debug", "Skip any symbolic debug info") do | |||
compiler.debug = Crystal::Debug::None | |||
end | |||
{% if LibLLVM::IS_40 %} | |||
opts.on("--lto FLAG", "Use ThinLTO -lto=thin") do |flag| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option description states to use =
but the actual is a space?
I've added the default options for llvm-4.0 in linux. I tested in on xenial using I also change to use clang-4.0 in osx for consistency. Some parts of the code already work only if compiled with llvm-4.0. The lto cache was filled on linux so I guess it's working. I didn't need to add |
Hello @bcardiff, will be possible for you to rebase this work on current master? Now that LLVM < 3.8 has been dropped and LLVM 4.0 has been added, I would like to give this a test with some some complex projects and report back. Thank you! ❤️ ❤️ ❤️ |
Should we revive this PR? Am i correct in saying that this PR is a huge win for release mode (we can compile in parallel) but in non-release mode the current method is better. if so, I think we should merge this but only enabled by default for |
@luislavena rebased and squashed Remember that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM5 and clang5 have been released. Let's support them :-)
src/compiler/crystal/compiler.cr
Outdated
lto_cache_dir = "#{output_dir}/lto.cache" | ||
Dir.mkdir_p(lto_cache_dir) | ||
{% if flag?(:darwin) %} | ||
cc = ENV["CC"]? || "clang-4.0 -flto=thin -Wl,-mllvm,-threads=#{n_threads},-cache_path_lto,#{lto_cache_dir},#{@release ? "-mllvm,-O2" : "-mllvm,-O0"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about clang-5.0
or clang-6.0
(nightly)?
Maybe just expecting clang
, as it was initialy, and document that a clang
binary must exist (e.g. as a link to clang-4.0
or clang-5.0
)? Or maybe accept an ENV["CLANG"]
.
src/compiler/crystal/compiler.cr
Outdated
@@ -546,6 +561,15 @@ module Crystal | |||
bc_name = self.bc_name | |||
object_name = self.object_name | |||
|
|||
{% if LibLLVM::IS_40 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again.
src/llvm/lib_llvm_ext.cr
Outdated
@@ -134,4 +133,8 @@ lib LibLLVMExt | |||
then : LibLLVM::BasicBlockRef, catch : LibLLVM::BasicBlockRef, | |||
bundle : LibLLVMExt::OperandBundleDefRef, | |||
name : LibC::Char*) : LibLLVM::ValueRef | |||
|
|||
{% if LibLLVM::IS_40 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again.
src/llvm/module.cr
Outdated
@@ -57,6 +57,12 @@ class LLVM::Module | |||
LibLLVM.write_bitcode_to_file self, filename | |||
end | |||
|
|||
{% if LibLLVM::IS_40 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again.
src/compiler/crystal/command.cr
Outdated
@@ -310,6 +310,12 @@ class Crystal::Command | |||
opts.on("--no-debug", "Skip any symbolic debug info") do | |||
compiler.debug = Crystal::Debug::None | |||
end | |||
{% if LibLLVM::IS_40 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be future (and present) proof:
{% unless LibLLVM::IS_38 || LibLLVM::IS_39 %}
I really think the |
You mean something like: def search_clang(*args)
args.each do |cc|
next unless cc
next unless `#{cc} --version 2>/dev/null` =~ /clang version (\d+\.\d+)/
return cc if $1.to_f >= 4.0
end
end
if clang = search_clang(ENV["CC"]?, "clang", "clang-4.0", "clang-5.0", "clang-6.0")
# ...
end Bonus: it supports |
Yeah, except we should error if CC is set and isn't a clang with LTO, instead of ignoring CC silently. I still think that not depending on the system linker at all (by default, probably a good idea to be able to use CC in case of bugs) is the optimal solution. But that's better as another PR. |
Add --lto=thin as compiler option. Emits .bc with summary metadata as .o Replace linker with clang with lto_cache_dir Forward n_threads compiler option to ThinLTO -mllvm,-threads=N Turn off implicit --single-module on release build with ThinLTO $ crystal build --lto=thin main.cr NB: --no-debug is sometimes needed
* Allow clang override via environment CLANG variable
I changed back to just So |
For anyone interested to hear about some real world ThinLTO improvements https://internals.rust-lang.org/t/help-test-out-thinlto/6017/17 (please be aware this is for Rust) |
@sdogruyol they seem to be in the same position as us, insofar that they generate modules in parallel for debug mode already, without thinlto, and they're trying out thinlto for release mode. |
I would love anything which makes --release mode compile faster |
src/compiler/crystal/compiler.cr
Outdated
@llvm_mod.write_bitcode_with_summary_to_file(object_name) | ||
@reused_previous_compilation = false | ||
llvm_mod.print_to_file ll_name if compiler.dump_ll? | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wondering why we dumped the LLVM IR in this block, then understood this is because we are skipping the rest of the method. Maybe extract methods to keep a normal flow, instead of adding a duplication?
if compiler.thin_lto
emit_bc_with_thin_lto_summary
else
emit_bc_and_compile_object
end
dump_llvm_ir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: there is a mixed use of @llvm_mod
instance variable and llvm_mod
getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ysbaddaden I updated it it something like that. The (private) compile_to_thin_lto
is emitted always, but raises if the llvm version does not support lto.
Yet, the invocation will never occur since the option is never toggled to true
in an non lto llvm version.
src/compiler/crystal/compiler.cr
Outdated
end | ||
dump_llvm_ir | ||
{% else %} | ||
raise {{ "ThinLTO is not available in LLVM #{LibLLVM::VERSION}".stringify }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easy to raise instantly in the option parser in addition to this, not after several (possibly expensive) compiler passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh never mind, we can't get to this statement if we don't have the correct LLVM version, as we simply compile out the --lto
option.
Any update? Perhaps we could get this in for |
I think this just needs a rebase for circleci and then it's 👍. I did the rebase locally but it doesn't seem I have permission to push to this PR. |
TL;DR This is one attempt to reduce compiling time. It's effect seems to be only useful in
--release
builds so far.Adds
--lto=thin
as compiler option.Currently it's only tested in osx. In linux the gold plugin or lld seems to be required.
Changes for ThinLTO mode
-mllvm,-threads=N
Basic usage
$ ./bin/crystal build --lto=thin main.cr
--release
can be used.--no-debug
is sometimes needed to due to some llvm metadata unsupported scenarios.Results
In release mode, the compilation of the compiler is reduced from ~7mins to ~2mins. ⚡️ The runtime performance of the seems to be same.
In non release mode, the overall first compile time is not shorter. It seems to double. 🐢 .
But for successive builds (3rd rebuild) the cache reduces the compile/linking from 1:33 to 0:04 (compiler_spec). Yet, there is no noticeable time reduction comparing compilation with/without LTO in the last rebuild stages.
The tests were compiling the compiler, the compiler specs and std specs.
The tests were un on a macOS 10.12.3, 2.7 GHz Intel Core i7, 16 GB 1600 MHz DDR3.
Note
The compiler stats are a bit misleading in ThinLTO since the "Codegen (bc+obj)" is just "Codegen (bc)". And the cache inform no reuse of .o files just because the cache is managed by llvm and not crystal. Maybe the compiler stats should be adapted to ThinLTO.
--lto=thin
is used so in the future--lto=full
could be supported if needed and we keep the same configuration option as llvm tools.ThinLTO is still under improvements/development. Specially regarding debug information.
When adding ThinLTO to D some nice scenarios appears from the benefit that C and D can be compiled to .bc. It would requiere some changes but it could be something good to keep in the radar.
References