Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-module inlining: Enable emission into multiple object files #3442

Merged
merged 2 commits into from
Jun 1, 2020

Conversation

kinke
Copy link
Member

@kinke kinke commented May 22, 2020

Previously, the logic gave up early if semantic3 was run for the function to be inlined, either because the function is going to be codegen'd in some root module, or because sema3 was already run for an
available_externally 'copy' in a previous compilation unit. This restricted a function to being defined in at most one compilation unit per ldc2 invocation.

So e.g. a little pragma(inline, true) wrapper in druntime wasn't inlined into other druntime modules, because the whole lib is built in a single cmdline by default, and sema3 was obviously run for the actual emission in the corresponding object file.

When later compiling Phobos in a single cmdline, only the first object file referencing the wrapper got lucky, running sema3 manually and getting an available_externally 'copy'.

@kinke kinke requested a review from JohanEngelen May 22, 2020 19:44
fd->ir->setDefined();
return;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been aware of this additional 'culling'.

@kinke
Copy link
Member Author

kinke commented May 22, 2020

Duplicate function literals without culling, now missing ones with culling...

@kinke kinke force-pushed the fix3126 branch 2 times, most recently from c071d88 to a8cf528 Compare May 23, 2020 01:02
// files.

// Generate unoptimized IR for 2 source files as separate compilation units.
// RUN: %ldc -c -output-ll %s %S/inputs/inlinables.d -od=%t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add testing of the reversed order of the files too? (IIRC we codegen in reverse order to also codegen stuff that has been sema3'ed while doing codegen).

// some non-inlined call.
// Prevent undefined symbol errors by defining literals in
// available_externally parents as regular non-available_externally
// functions (with linkonce_odr linkage).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a function is available_externally then why is a literal referenced by that externally available function not available from that external object file either? Is it because we define literals as private symbols of that module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this either. Emitting any symbols as non-avilable_externally risks accidentally duplicating symbols with slightly different mangles, etc. in case of frontend bugs.

Copy link
Member Author

@kinke kinke May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problematic thing for the Phobos unittests was https://github.com/ldc-developers/phobos/blob/3371f469af84adf5b4cff8b74066b60b0460454d/std/experimental/logger/core.d#L327-L340. The delegate literals in there are for each lazy arg; they are apparently treated as literals inside that function, while they are referenced from the call sites (to populate the lazy arg delegates, putting the function pointer into those).

My 1st approach was to make sure all literals (in available_externally parents) are emitted as available_externally too and to apply pragma(inline, true) to those as well. Didn't work, and looking at the .ll revealed the indirect usage via function pointer for the delegates.

As to why the log template parent is treated as available_externally, I'm not sure. I tried returning false for each function inside some template instance where needsCodegen() returns false, but that didn't help (something like if (auto ti = DtoIsTemplateInstance(fdecl)) { if (!ti.needsCodegen()) /* not available_externally */ }).

Copy link
Member

@JohanEngelen JohanEngelen May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the old problem of __FILE__ as default template argument? Depending on compilation command line, __FILE__ will be different and thus the mangling will be different. Then, what can happen is:

  1. phobos source somewhere uses the log template with specific template args A1 in function f --> log(A1)() is emitted into libphobos
  2. in a separate compilation, the user's code is compiled that calls log indirectly through function f in phobos (user code --> f --> log).
  3. Now what the compiler sees is that f should already be codegenned in libphobos, thus f --> log does not need codegen. All is good.
  4. With cross module inlining enabled, f is codegenned as an inlining candidate, and log aswell, both marked as available_externally.
  5. But now the __FILE__ bites us: in step 1, the commandline was such that __FILE__ ended up as relative path, but in step 4. (with the user's code and commandline) it ends up as absolute path, or some different relative path. So actually the call is log(A2)() and the call will be to that symbol (mangling includes A2 instead of A1): the compiler sees that that call should already have been codegenned (hence available_externally), but because __FILE__ was different, that is not true. ---> Missing symbols during linking when LLVM decides to inline f but not to inline the available_externally log (or literals in it, in this case).

Ideas for solution:

  1. When crossmodule codegenning, always_inline when __FILE__ is a template parameter.
  2. Never crossmodule codegen and never apply externally_available when __FILE__ is template parameter
  3. Make __FILE__ much more predictable. For example by forcing it to one/two/three.d when module one.two.three; is at the top of the file (and not two/three.d or zero/one/two/three.d when that would also work with the set of -I=... cmdline options passed).

Ideas 1 and 2 could not be implemented at the time because the D frontend discards that information about string template arguments. This is very bad anyway, and needs to be fixed in the frontend.
Idea 3, I didn't think of at the time. Don't know how many corner cases there will be.

Copy link
Member

@JohanEngelen JohanEngelen May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so we went we option 1, with explicit pragma(inline,true) needed everywhere where __FILE__ is used :(. Clearly, that cannot be done with function literals. Static local variables will probably have the same problem.)

@kinke kinke marked this pull request as ready for review May 24, 2020 20:08
@kinke
Copy link
Member Author

kinke commented May 24, 2020

Some printf debugging showed the actual issue, see last commit. The test now covers both orders, includes a little pragma(inline, true) chain involving a template, and also tests the case where functions from a non-compiled module are to be inlined into 2 compilation units (i.e., the Phobos use case for little druntime wrappers).

I'm sure there are still more than enough issues with -enable-cross-module-inlining, but my primary aim is clearly to fix pragma(inline, true), i.e., to resolve #3126 - IMO a prerequisite for splitting up std.math into a package without losing performance (without LTO).

kinke added 2 commits June 1, 2020 02:28
Previously, the logic gave up early if semantic3 was run for the
function to be inlined, either because the function is going to be
codegen'd in some root module, or because sema3 was already run for an
available_externally 'copy' in a previous compilation unit.
This restricted a function to being defined in at most one compilation
unit per ldc2 invocation.

So e.g. a little `pragma(inline, true)` wrapper in druntime wasn't
inlined into other druntime modules, because the whole lib is built in a
single cmdline by default, and sema3 was obviously run for the actual
emission in the corresponding object file.

When later compiling Phobos in a single cmdline, only the first object
file referencing the wrapper got lucky, running sema3 manually and
getting an available_externally 'copy'.
…ernally

This is exactly what happened for the 4 problematic delegate literals
for the Phobos unittests - DtoDefineFunction(fd) could result in that
definition ending up as available_externally - all done as part of
DtoResolveFunction(), which declared it, and DtoDeclareFunction()
defined it as available_externally, and the outer DtoDefineFunction()
returned early in that case without fixing up the linkage.
@kinke kinke merged commit 565451c into ldc-developers:master Jun 1, 2020
@kinke kinke deleted the fix3126 branch June 1, 2020 15:31
JohanEngelen added a commit to weka/ldc that referenced this pull request Oct 18, 2020
This reverts commit 565451c, reversing
changes made to 3e859c9.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants