Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add ThinLTO for LLVM >= 4.0 #4367

Merged
merged 4 commits into from
Oct 30, 2017
Merged

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented May 2, 2017

TL;DR This is one attempt to reduce compiling time. It's effect seems to be only useful in --release builds so far.


Adds --lto=thin as compiler option.
Currently it's only tested in osx. In linux the gold plugin or lld seems to be required.

Changes for ThinLTO mode

  • Emits .bc with summary metadata as .o as result of the codegen phase.
  • Replace linker with clang with lto_cache_dir enabling ThinLTO default cache for incremental builds.
  • Forward n_threads compiler option to ThinLTO -mllvm,-threads=N
  • Turn off implicit --single-module on release build with ThinLTO so the proper compilation steps can be run in parallel.

Basic usage

$ ./bin/crystal build --lto=thin main.cr

  • The --release can be used.
  • The --no-debug is sometimes needed to due to some llvm metadata unsupported scenarios.

Results

In release mode, the compilation of the compiler is reduced from ~7mins to ~2mins. ⚡️ The runtime performance of the seems to be same.

In non release mode, the overall first compile time is not shorter. It seems to double. 🐢 .

But for successive builds (3rd rebuild) the cache reduces the compile/linking from 1:33 to 0:04 (compiler_spec). Yet, there is no noticeable time reduction comparing compilation with/without LTO in the last rebuild stages.

The tests were compiling the compiler, the compiler specs and std specs.

The tests were un on a macOS 10.12.3, 2.7 GHz Intel Core i7, 16 GB 1600 MHz DDR3.

Note

The compiler stats are a bit misleading in ThinLTO since the "Codegen (bc+obj)" is just "Codegen (bc)". And the cache inform no reuse of .o files just because the cache is managed by llvm and not crystal. Maybe the compiler stats should be adapted to ThinLTO.

--lto=thin is used so in the future --lto=full could be supported if needed and we keep the same configuration option as llvm tools.

ThinLTO is still under improvements/development. Specially regarding debug information.

When adding ThinLTO to D some nice scenarios appears from the benefit that C and D can be compiled to .bc. It would requiere some changes but it could be something good to keep in the radar.

References

@@ -6,6 +6,10 @@
#include <llvm/IR/DebugLoc.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Metadata.h>
#include "llvm/Analysis/ModuleSummaryAnalysis.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't added #include paths be enclosed in <...> instead of "..."?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same. And it is already inconsistent some lines above 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not the same, since #include <...> form uses system header files, while #include "..." starts from current directory. See http://stackoverflow.com/questions/21593/what-is-the-difference-between-include-filename-and-include-filename for reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same (in this context). I need to add some ifdef too for older llvm, I will change it then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive results!

if thin_lto
lto_cache_dir = "#{output_dir}/lto.cache"
Dir.mkdir_p(lto_cache_dir)
cc = ENV["CC"]? || "clang -flto=thin -Wl,-mllvm,-threads=#{n_threads},-cache_path_lto,#{lto_cache_dir},#{@release ? "-mllvm,-O2" : "-mllvm,-O0"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If set is set by the system or to clang-4.0 for example, then thinlto will be silently ignored.

Is it possible to detect that CC is clang >= 4.0 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the LLD homepage, it says that lld is embeddable, perhaps it would be better to just link against lld and bundle it with crystal since it's probably rather rare in the wild.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embedding lld might be a good way to go in the long term. I wanted to change as little as possible.

@ysbaddaden yes, I wanted to be able to override the linker as you can do using CC. Since the flags are sensible to the linker to be used it basically boils down to the user to set the rest of the flags as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but we can specify link flags with --link-flags, we already force flags and libraries on CC anyway, and if we're going this way, there is --cross-compile to only generate objects.

The thing is that Debian and Freebsd systems (and possibly others) don't provide a generic clang binary or link, only clang-3.9, clang-4.0 and so on, so I can't use thin lto as the following call, or when I simply export CC in my ~/.profile, which would result in non optimized binaries (many modules, no link-time optimizations):

$ CC=clang-4.0 shards build --release --lto=thin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with ThinLTO the .bc stored in .o files with augmented metadata. This is different than --cross-compile.

@@ -292,6 +292,12 @@ class Crystal::Command
opts.on("", "--no-debug", "Skip any symbolic debug info") do
compiler.debug = Crystal::Debug::None
end
{% if LibLLVM::IS_40 %}
opts.on("--lto FLAG", "Use ThinLTO -lto=thin") do |flag|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option description states to use = but the actual is a space?

@bcardiff
Copy link
Member Author

bcardiff commented May 6, 2017

@ysbaddaden

I've added the default options for llvm-4.0 in linux. I tested in on xenial using ppa:xorg-edgers/ppa which provides clang-4.0 (but not plain clang).

I also change to use clang-4.0 in osx for consistency. Some parts of the code already work only if compiled with llvm-4.0.

The lto cache was filled on linux so I guess it's working. I didn't need to add -fuse-ld=gold as an additional compiler flag. I don't now if it could be required on certain setups.

@luislavena
Copy link
Contributor

Hello @bcardiff, will be possible for you to rebase this work on current master?

Now that LLVM < 3.8 has been dropped and LLVM 4.0 has been added, I would like to give this a test with some some complex projects and report back.

Thank you! ❤️ ❤️ ❤️

@RX14
Copy link
Contributor

RX14 commented Oct 9, 2017

Should we revive this PR?

Am i correct in saying that this PR is a huge win for release mode (we can compile in parallel) but in non-release mode the current method is better. if so, I think we should merge this but only enabled by default for --release.

@bcardiff
Copy link
Member Author

bcardiff commented Oct 10, 2017

@luislavena rebased and squashed

Remember that --no-debug is sometimes needed. Specially when you get something like

ThinLTO: S-tring.o2: error: Invalid function metadata: outgoing forward refs (Producer: 'LLVM4.0.0' Reader: 'LLVM 4.0.0')
LLVM ERROR: importFunctions failed
clang-4.0: error: linker command failed with exit code 1 (use -v to see invocation)
Error: execution of command failed with code: 1: `clang-4.0 -flto=thin -Wl,-mllvm,-threads=8,-cache_path_lto,/Users/bcardiff/.crystal/...-main.cr/lto.cache,-mllvm,-O2 "${@}" -o '..../src/ext/libcrystal.a -levent -liconv -ldl -L/usr/lib -L/usr/local/lib`

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM5 and clang5 have been released. Let's support them :-)

lto_cache_dir = "#{output_dir}/lto.cache"
Dir.mkdir_p(lto_cache_dir)
{% if flag?(:darwin) %}
cc = ENV["CC"]? || "clang-4.0 -flto=thin -Wl,-mllvm,-threads=#{n_threads},-cache_path_lto,#{lto_cache_dir},#{@release ? "-mllvm,-O2" : "-mllvm,-O0"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about clang-5.0 or clang-6.0 (nightly)?

Maybe just expecting clang, as it was initialy, and document that a clang binary must exist (e.g. as a link to clang-4.0 or clang-5.0)? Or maybe accept an ENV["CLANG"].

@@ -546,6 +561,15 @@ module Crystal
bc_name = self.bc_name
object_name = self.object_name

{% if LibLLVM::IS_40 %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again.

@@ -134,4 +133,8 @@ lib LibLLVMExt
then : LibLLVM::BasicBlockRef, catch : LibLLVM::BasicBlockRef,
bundle : LibLLVMExt::OperandBundleDefRef,
name : LibC::Char*) : LibLLVM::ValueRef

{% if LibLLVM::IS_40 %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again.

@@ -57,6 +57,12 @@ class LLVM::Module
LibLLVM.write_bitcode_to_file self, filename
end

{% if LibLLVM::IS_40 %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again.

@@ -310,6 +310,12 @@ class Crystal::Command
opts.on("--no-debug", "Skip any symbolic debug info") do
compiler.debug = Crystal::Debug::None
end
{% if LibLLVM::IS_40 %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be future (and present) proof:

{% unless LibLLVM::IS_38 || LibLLVM::IS_39 %}

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2017

I really think the cc override for thinlto is a massive hack that we can't really merge. We should search PATH for a clang version with the correct version.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Oct 10, 2017

You mean something like:

def search_clang(*args)
  args.each do |cc|
    next unless cc
    next unless `#{cc} --version 2>/dev/null` =~ /clang version (\d+\.\d+)/
    return cc if $1.to_f >= 4.0
  end
end

if clang = search_clang(ENV["CC"]?, "clang", "clang-4.0", "clang-5.0", "clang-6.0")
  # ...
end

Bonus: it supports CC=clang-5.0 to select a specific version.

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2017

Yeah, except we should error if CC is set and isn't a clang with LTO, instead of ignoring CC silently.

I still think that not depending on the system linker at all (by default, probably a good idea to be able to use CC in case of bugs) is the optimal solution. But that's better as another PR.

Brian J. Cardiff and others added 3 commits October 10, 2017 09:24
Add --lto=thin as compiler option.
Emits .bc with summary metadata as .o
Replace linker with clang with lto_cache_dir
Forward n_threads compiler option to ThinLTO -mllvm,-threads=N
Turn off implicit --single-module on release build with ThinLTO

$ crystal build --lto=thin main.cr

NB: --no-debug is sometimes needed
* Allow clang override via environment CLANG variable
@bcardiff
Copy link
Member Author

I changed back to just clang with an option to override using CLANG env var. My intention was to leave CC as an override of the full linkage command but the the object file input/output so the user is able to tweak the thin lto option by using that alternative.

So CLANG can be used if the user just cares about the clang version leaving crystal compiler fill the thin lto options.

@sdogruyol
Copy link
Member

For anyone interested to hear about some real world ThinLTO improvements https://internals.rust-lang.org/t/help-test-out-thinlto/6017/17 (please be aware this is for Rust)

@RX14
Copy link
Contributor

RX14 commented Oct 10, 2017

@sdogruyol they seem to be in the same position as us, insofar that they generate modules in parallel for debug mode already, without thinlto, and they're trying out thinlto for release mode.

@bararchy
Copy link
Contributor

I would love anything which makes --release mode compile faster

@llvm_mod.write_bitcode_with_summary_to_file(object_name)
@reused_previous_compilation = false
llvm_mod.print_to_file ll_name if compiler.dump_ll?
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering why we dumped the LLVM IR in this block, then understood this is because we are skipping the rest of the method. Maybe extract methods to keep a normal flow, instead of adding a duplication?

if compiler.thin_lto
  emit_bc_with_thin_lto_summary
else
  emit_bc_and_compile_object
end

dump_llvm_ir

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: there is a mixed use of @llvm_mod instance variable and llvm_mod getter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ysbaddaden I updated it it something like that. The (private) compile_to_thin_lto is emitted always, but raises if the llvm version does not support lto.

Yet, the invocation will never occur since the option is never toggled to true in an non lto llvm version.

end
dump_llvm_ir
{% else %}
raise {{ "ThinLTO is not available in LLVM #{LibLLVM::VERSION}".stringify }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easy to raise instantly in the option parser in addition to this, not after several (possibly expensive) compiler passes?

Copy link
Contributor

@RX14 RX14 Oct 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind, we can't get to this statement if we don't have the correct LLVM version, as we simply compile out the --lto option.

@RX14
Copy link
Contributor

RX14 commented Oct 27, 2017

Any update? Perhaps we could get this in for 0.24.0?

@RX14
Copy link
Contributor

RX14 commented Oct 28, 2017

I think this just needs a rebase for circleci and then it's 👍. I did the rebase locally but it doesn't seem I have permission to push to this PR.

@RX14 RX14 added this to the Next milestone Oct 30, 2017
@RX14 RX14 merged commit e8aca96 into crystal-lang:master Oct 30, 2017
@oprypin oprypin mentioned this pull request Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants