-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.067 branch: Enormous compile time increase after dtor rework #1010
Comments
We seem to generate almost 300k landing pads (for dtors) in a single function. I find it rather impressive that the compilation finishes at all. |
The function in question is |
Small sample: (line numbers in a separate file) postinvoke649: ; preds = %temporariesLandingPad646
call void @_d_eh_resume_unwind(i8* %125), !dbg !5629 ; [debug line = 6:5]
unreachable, !dbg !5629 ; [debug line = 6:5]
eh.collision641: ; preds = %temporariesLandingPad646
%landing_pad642 = landingpad { i8*, i32 } personality i32 (i32, i32, i64, i8*, i8*)* @_d_eh_personality
cleanup, !dbg !5627 ; [#uses = 1 type = { i8*, i32 }] [debug line = 15:5]
%124 = extractvalue { i8*, i32 } %landing_pad642, 0, !dbg !5627 ; [#uses = 1 type = i8*] [debug line = 15:5]
invoke void @_d_eh_handle_collision(i8* %124, i8* %135)
to label %postinvoke644 unwind label %temporariesLandingPad643, !dbg !5627 ; [debug line = 15:5]
postinvoke644: ; preds = %eh.collision641
unreachable, !dbg !5627 ; [debug line = 15:5]
postinvoke647: ; preds = %postinvoke640
call void @_d_eh_resume_unwind(i8* %135), !dbg !5629 ; [debug line = 6:5]
unreachable, !dbg !5629 ; [debug line = 6:5]
temporariesLandingPad646: ; preds = %postinvoke640
%landing_pad648 = landingpad { i8*, i32 } personality i32 (i32, i32, i64, i8*, i8*)* @_d_eh_personality
cleanup, !dbg !5629 ; [#uses = 2 type = { i8*, i32 }] [debug line = 6:5]
%125 = extractvalue { i8*, i32 } %landing_pad648, 0, !dbg !5629 ; [#uses = 1 type = i8*] [debug line = 6:5]
%126 = extractvalue { i8*, i32 } %landing_pad648, 1, !dbg !5629 ; [#uses = 0 type = i32] [debug line = 6:5]
invoke void @_D3std3uni38__T13InversionListTS3std3uni8GcPolicyZ13InversionList11__fieldDtorMFNaNbNiNeZv(%"std.uni.InversionList!(GcPolicy).InversionList"* %__tmpfordtor1983)
to label %postinvoke649 unwind label %eh.collision641, !dbg !5629 ; [debug line = 6:5] [display name = std.uni.InversionList!(GcPolicy).InversionList.~this]
temporariesLandingPad643: ; preds = %eh.collision641
%landing_pad645 = landingpad { i8*, i32 } personality i32 (i32, i32, i64, i8*, i8*)* @_d_eh_personality
cleanup, !dbg !5627 ; [#uses = 2 type = { i8*, i32 }] [debug line = 15:5]
%127 = extractvalue { i8*, i32 } %landing_pad645, 0, !dbg !5627 ; [#uses = 1 type = i8*] [debug line = 15:5]
%128 = extractvalue { i8*, i32 } %landing_pad645, 1, !dbg !5627 ; [#uses = 0 type = i32] [debug line = 15:5]
call void @_D3std3uni38__T13InversionListTS3std3uni8GcPolicyZ13InversionList11__fieldDtorMFNaNbNiNeZv(%"std.uni.InversionList!(GcPolicy).InversionList"* %__tmpfordtor1983), !dbg !5627 ; [debug line = 15:5] [display name = std.uni.InversionList!(GcPolicy).InversionList.~this]
call void @_d_eh_resume_unwind(i8* %127), !dbg !5627 ; [debug line = 15:5]
unreachable, !dbg !5627 ; [debug line = 15:5]
eh.collision622: ; preds = %temporariesLandingPad637
%landing_pad623 = landingpad { i8*, i32 } personality i32 (i32, i32, i64, i8*, i8*)* @_d_eh_personality
cleanup, !dbg !5625 ; [#uses = 1 type = { i8*, i32 }] [debug line = 18:5]
%129 = extractvalue { i8*, i32 } %landing_pad623, 0, !dbg !5625 ; [#uses = 1 type = i8*] [debug line = 18:5]
invoke void @_d_eh_handle_collision(i8* %129, i8* %155)
to label %postinvoke625 unwind label %temporariesLandingPad624, !dbg !5625 ; [debug line = 18:5]
postinvoke625: ; preds = %eh.collision622
unreachable, !dbg !5631 ; [debug line = 6:5]
postinvoke638: ; preds = %postinvoke621
invoke void @_D3std3uni38__T13InversionListTS3std3uni8GcPolicyZ13InversionList11__fieldDtorMFNaNbNiNeZv(%"std.uni.InversionList!(GcPolicy).InversionList"* %set)
to label %postinvoke656 unwind label %temporariesLandingPad655, !dbg !5633 ; [debug line = 6:5] [display name = std.uni.InversionList!(GcPolicy).InversionList.~this]
postinvoke658: ; preds = %temporariesLandingPad655
call void @_d_eh_resume_unwind(i8* %131), !dbg !5633 ; [debug line = 6:5]
unreachable, !dbg !5633 ; [debug line = 6:5]
eh.collision650: ; preds = %temporariesLandingPad655
%landing_pad651 = landingpad { i8*, i32 } personality i32 (i32, i32, i64, i8*, i8*)* @_d_eh_personality
cleanup, !dbg !5629 ; [#uses = 1 type = { i8*, i32 }] [debug line = 6:5]
%130 = extractvalue { i8*, i32 } %landing_pad651, 0, !dbg !5629 ; [#uses = 1 type = i8*] [debug line = 6:5]
invoke void @_d_eh_handle_collision(i8* %130, i8* %155)
to label %postinvoke653 unwind label %temporariesLandingPad652, !dbg !5629 ; [debug line = 6:5]
postinvoke653: ; preds = %eh.collision650
unreachable, !dbg !5629 |
@kinke: Could you have a quick look at this? Not quite minimal test case: void main() {
import std.uni;
import std.typecons : tuple;
import std.algorithm : equal;
auto set = CodepointSet('A', 'D'+1, 'a', 'd'+1);
set = unicode.ASCII;
set = CodepointSet('a', 'z'+1, 'а', 'я'+1);
foreach(v; 'a'..'z'+1)
assert(set[v]);
// Cyrillic lowercase interval
foreach(v; 'а'..'я'+1)
assert(set[v]);
//specific order is not required, intervals may interesect
auto set2 = CodepointSet('а', 'я'+1, 'a', 'd', 'b', 'z'+1);
assert(set2.byInterval.equal(set.byInterval));
auto gothic = unicode.Gothic;
// Gothic letter ahsa
assert(gothic['\U00010330']);
// no ascii in Gothic obviously
assert(!gothic['$']);
CodepointSet emptySet;
assert(emptySet.length == 0);
assert(emptySet.empty);
set = unicode.ASCII;
// union with the inverse gets all of code points in the Unicode
assert((set | set.inverted).length == 0x110000);
// no intersection with inverse
assert((set & set.inverted).empty);
} |
Unfortunately, I don't have time these days, there's a bachelor party this weekend. What surely leads to loads of landing pads is the fact that each temporary is destructed safely, i.e., for a potentially throwing dtor, there's a corresponding landing pad destructing all older temporaries (safely again) - see #914 (comment). |
Okay. Guess I need to figure out why we generate thousands of landing pads for a single function then and didn't before. |
In other words, what I guess I'm asking is what the difference to the old code is. Didn't we handle throwing constructors correctly before? |
See #914 (comment). Previously, |
If I remember correctly, the way DMD does it is by emitting boolean local "gates" to track which arguments have been constructed yet, and then having a single landing pad to destruct all fully constructed objects. I need to look into the source to verify this, though. |
Hmm, I wonder whether they actually also do throwing dtors that way (looping back to the same landing pad). Seems a bit unlikely, given that the argprefix code is rather simple. |
Afaik the argprefix code uses a single gate (bool) for all dtor expressions, but that's because of implicit moving of the temporaries for the function call, i.e., eliding postblit and dtor if there was no throw. But it doesn't keep track of what has been constructed yet - makes sense, as that needs to be done for any expression tree anyway, not just the special argprefix one. |
DMD's implementation seems hopelessly broken anyway, unless I'm missing something: import core.stdc.stdio;
struct ThrowingCtor {
const char* name;
this(const char* name) {
printf("Constructing %s\n", name);
this.name = name;
throw new Exception("somebody set up us the bomb");
}
~this() {
printf("Destroying %s\n", name);
}
}
struct ThrowingDtor {
const char* name;
this(const char* name) {
printf("Constructing %s\n", name);
this.name = name;
}
~this() {
printf("Destroying %s\n", name);
throw new Exception("for great justice");
}
}
void foo(ThrowingDtor a, ThrowingDtor b, ThrowingCtor c) {}
void main() {
auto z = ThrowingDtor("z");
foo(ThrowingDtor("a"), ThrowingDtor("b"), ThrowingCtor("c"));
} prints
i.e. |
We, in addition to that, seem not to implement exception chaining correctly in this example (the "for great justice" is printed instead of being chained to the first one). |
Confirmed on Win64 with DMD 2.067.1:
Even better, when not throwing in ThrowingDtor::~this():
|
Simpler example that does not throw in destructors (which should probably work due to exception chaining, but I couldn't find any confirmation in the spec for): import core.stdc.stdio;
struct Foo {
const char* name;
this(const char* name, bool fail) {
printf("Constructing %s\n", name);
this.name = name;
if (fail) {
throw new Exception("somebody set up us the bomb");
}
}
~this() {
printf("Destroying %s\n", name);
}
}
void foo(Foo a, Foo b) {}
void main() {
foo(Foo("a", false), Foo("b", true));
} |
Oh wow, this one's cool: import core.stdc.stdio;
import core.stdc.string;
struct Foo {
const char* name;
bool throwInDtor;
this(const char* name, bool throwInDtor) {
printf("Constructing %s\n", name);
this.name = name;
this.throwInDtor = throwInDtor;
}
~this() {
printf("Destroying %s\n", name);
if (throwInDtor)
throw new Exception("dtor " ~ name[0]);
}
}
Foo make(const char* name, bool doThrow) {
if (doThrow)
throw new Exception("make()");
return Foo(name, false);
}
void foo(Foo a, Foo b, Foo c, Foo d) { throw new Exception("foo()!"); }
void main() {
foo(Foo("a", true), make("b", true), Foo("c", false), make("d", true));
} DMD 2.067.1, Win32:
Initially I thought DMD would just support regular function calls and could be missing ctor calls, but there's way more to it.
I suspect cases 2 & 3 to be a front-end bug when generating argprefix. Case 5 is definitely a DMD bug which may have been fixed in the latest DMD 2.068 RC or beta. |
Just tested DMD 2.068-rc1 - same results as DMD 2.067.1. :( |
Seems like DMD 2.068.0 on OS X gets 4) wrong; it does not destruct a. Edit: Sorry, forgot to mention link for my report here: https://issues.dlang.org/show_bug.cgi?id=14903 |
On the plus side, it does not enter an infinite loop on 5. |
Maybe I'm missing something obvious, but I think I just realized something regarding the complexity issue: Instead of the current version with its quadratic (or possibly even worse, didn't think much about it) growth, can't we just generate a linear number of nested EH contexts, i.e. the equivalent of a try-finally block per local variable with a dtor? |
Most likely worse! :D /edit: Well no, the single landing pad wouldn't allow for exception chaining, we'd leave it after the first throwing dtor [I remember now]! That would require the exponential expansion again, in the landing pad, to make sure all dtors are invoked and all exceptions are chained - unless one constructs the exception chain manually, in that case the complexity would only be linear... |
Hmm, I knew I missed something. Even for nested try-finally blocks the number of basic blocks is currently quadratic. I get the feeling that this std.uni unit test behaves much worse than that, though. |
I think the total number of dtor calls in a temporariesLandingPad (incl. all its nested postinvoke-temporariesLandingPad scopes) really is exponential in the number of temporaries with potentially throwing dtors N: |
Yep. What DMD seems to do is to branch to the next landing pad at the end of each block (in case no exception is thrown), which looks like it could help things a lot. To be able to implement this, we might have to rework exception chaining support such that we do not need the special collision landing pad "branches". I need to think about this some more, though. (Plus, I think we'd have to split up the landing pad into the header instruction and the actual code content. This would be a comparatively minor tweak, though.) |
The single landing pad with manual exception chaining would be tempting, e.g., for an expression tree bool isAlive1, isAlive2, isAlive3;
auto temp1 = Struct(), isAlive1 = true;
auto temp2 = Struct(), isAlive2 = true;
auto temp3 = Struct(), isAlive3 = true;
invoke throwingFunction
postInvoke:
call nonThrowingFoo
// possibly more invokes, all landing in the same temporariesLandingPad
// end of expression tree: destruct temporaries normally
initialize exception chain with no exception
goto destructTemporaries
temporariesLandingPad:
extract exception
initialize exception chain with exception
goto destructTemporaries
destructTemporaries:
if (!isAlive3) goto postInvoke3
else invoke temp3.~this()
landingPad3:
extract exception
add to chain
goto postInvoke3
postInvoke3:
if (!isAlive2) goto postInvoke2
else invoke temp2.~this()
landingPad2:
extract exception
add to chain
goto postInvoke2
postInvoke2:
if (!isAlive1) goto postInvoke1
else invoke temp1.~this()
landingPad1:
extract exception
add to chain
goto postInvoke1
postInvoke1:
check for exception (chain) and rethrow No idea whether that would be feasible, I have no idea about exception collision/chaining. |
Even not considering exception chaining for the moment, wouldn't we rather have proper landing pads set up for the various "construction stages" that branch off to somewhere in the list of destruction BBs instead of keeping "isAlive" booleans around? |
If we fix this properly, we can finally also get rid of having to be able to emit code more than once for finally handlers. |
Hmm, that's more or less what it amounts to anyway and would obviously be preferrable, but eliding the bools would mean that if the 4th temporary is to be destructed, that all 3 below it (declared before it) need to be alive too. So we'll probably need to be careful about not destructing any temporary earlier (no more nested toElemDtor() scopes), otherwise the current temporaries stack size doesn't uniquely map to the most recent temporary (=> destruction BB to begin with). I.e., the temporaries stack should constantly grow in declaration order. |
Is there a valid situation where temporaries would not be destructed in reverse order of construction? It seems to me that all such occurrences would likely be bugs anyway. |
Nope, it's just got to do with unique mapping:
Now the problem is that at the end of temp3's destruction BB, we'd need to jump to temp1's destruction BB, not to temp2's, as that one is already taken care of. Well actually now that I've typed this down, I'm thinking that that is actually not a problem, it's just some additional complexity when determining the branch targets, as the parent temporary remains the same throughout the lifetime of a temporary (and always outlives it). |
I have an idea how to avoid the separate collision handling by moving that code back into the personality routine. However, another issue is that we still generate a separate |
Sounds great.
Where's the code responsible for this? The codegen for a The .ll for this really seems excessive: ;) struct S {
int a;
~this() {}
}
int main(string[] args) {
int foo;
try {
S s1;
if (args.length <= 1) return 1;
S s2;
if (args.length <= 2) return 2;
try {
S s3;
if (args.length <= 3) return 3;
S s4;
}
finally {
foo = 333;
}
}
finally {
if (foo == 0)
foo = 666;
}
return foo;
} |
…e collision landing pads in user code Also fixes a chaining bug noticed in ldc-developers/ldc#1010. Support for HP_LIBUNWIND is removed; it wasn't functional before either. In case somebody wants to port LDC to a platform without an unwinder, they can use the Apple unwinder implementation available as part of the LLVM project (repository at http://llvm.org/git/libunwind).
With my exception chaining fixes, we handle 4) and 5) from your test case set correctly. |
See #1019. Regarding the Are you working on this? |
I just posted a preliminary patch for the frontend argprefix issue to Bugzilla. It fixes all the test cases for us, but I don't have time to properly test it right now. |
…e collision landing pads in user code Also fixes a chaining bug noticed in ldc-developers/ldc#1010. Support for HP_LIBUNWIND is removed; it wasn't functional before either. In case somebody wants to port LDC to a platform without an unwinder, they can use the Apple unwinder implementation available as part of the LLVM project (repository at http://llvm.org/git/libunwind).
Impressive progress! I'm lacking time too atm.. |
Unfortunately, I lack the time to do the actual rewrite of the finally block generation (i.e. fixing the performance regression) anytime soon. Now that exception chaining is out of the picture, it should be fairly simple, though. Having multiple exit paths from a scope might complicate the initial analysis of the problem some, but I'm fairly confident that the solution will be fairly straightforward once you have the right perspective. It might also help to look at what Clang does. It seems to use local integer variables to disambiguate cases where there are multiple exit paths from a "finally" block. However, without having thought about the issue in any detail, it seems that our implementation might end up quite a bit simpler because the rules for goto, etc. are much more restricted in D. |
I'm knee-deep into what amounts to a complete rewrite of all our control flow scoping-related codegen. Hope I can finish it today; otherwise I'll have to post an incomplete version for somebody else to continue. |
Heheh, good news & good luck! 👍 |
Progress! For the example in #1010 (comment) we now generate the following IR (EH disabled temporarily): define i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) #0 {
%args = alloca { i64, { i64, i8* }* }, align 8 ; [#uses = 4 type = { i64, { i64, i8* }* }*]
%foo = alloca i32, align 4 ; [#uses = 6 type = i32*]
%s1 = alloca %test.S, align 4 ; [#uses = 2 type = %test.S*]
%return.slot = alloca i32 ; [#uses = 5 type = i32*]
%s2 = alloca %test.S, align 4 ; [#uses = 2 type = %test.S*]
%s3 = alloca %test.S, align 4 ; [#uses = 2 type = %test.S*]
%s4 = alloca %test.S, align 4 ; [#uses = 2 type = %test.S*]
%branchsel.finally9 = alloca i32 ; [#uses = 3 type = i32*]
%branchsel.finally8 = alloca i32 ; [#uses = 3 type = i32*]
%branchsel.finally4 = alloca i32 ; [#uses = 4 type = i32*]
%branchsel.finally1 = alloca i32 ; [#uses = 5 type = i32*]
%branchsel.finally = alloca i32 ; [#uses = 5 type = i32*]
store { i64, { i64, i8* }* } %unnamed, { i64, { i64, i8* }* }* %args
store i32 0, i32* %foo
%1 = bitcast %test.S* %s1 to i8* ; [#uses = 1 type = i8*]
call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 4, i32 1, i1 false)
%2 = getelementptr { i64, { i64, i8* }* }* %args, i32 0, i32 0 ; [#uses = 1 type = i64*]
%.len = load i64* %2 ; [#uses = 1 type = i64]
%3 = icmp ule i64 %.len, 1 ; [#uses = 1 type = i1]
br i1 %3, label %if2, label %endif3
finally: ; preds = %try.success15, %finally1
%4 = load i32* %foo ; [#uses = 1 type = i32]
%5 = icmp eq i32 %4, 0 ; [#uses = 1 type = i1]
br i1 %5, label %if, label %endif
if: ; preds = %finally
store i32 666, i32* %foo
br label %endif
endif: ; preds = %if, %finally
%6 = load i32* %branchsel.finally ; [#uses = 1 type = i32]
switch i32 %6, label %return [
i32 1, label %try.success16
]
finally1: ; preds = %try.success14, %finally4, %if2
call void @_D4test1S6__dtorMFZv(%test.S* %s1)
%7 = load i32* %branchsel.finally1 ; [#uses = 1 type = i32]
switch i32 %7, label %finally [
i32 1, label %try.success15
]
if2: ; preds = %0
store i32 1, i32* %return.slot
store i32 0, i32* %branchsel.finally1
store i32 0, i32* %branchsel.finally
br label %finally1
endif3: ; preds = %0
%8 = bitcast %test.S* %s2 to i8* ; [#uses = 1 type = i8*]
call void @llvm.memset.p0i8.i64(i8* %8, i8 0, i64 4, i32 1, i1 false)
%9 = getelementptr { i64, { i64, i8* }* }* %args, i32 0, i32 0 ; [#uses = 1 type = i64*]
%.len5 = load i64* %9 ; [#uses = 1 type = i64]
%10 = icmp ule i64 %.len5, 2 ; [#uses = 1 type = i1]
br i1 %10, label %if6, label %endif7
return: ; preds = %endif
%11 = load i32* %return.slot ; [#uses = 1 type = i32]
ret i32 %11
finally4: ; preds = %try.success13, %finally8, %if6
call void @_D4test1S6__dtorMFZv(%test.S* %s2)
%12 = load i32* %branchsel.finally4 ; [#uses = 1 type = i32]
switch i32 %12, label %finally1 [
i32 1, label %try.success14
]
if6: ; preds = %endif3
store i32 2, i32* %return.slot
store i32 0, i32* %branchsel.finally4
store i32 0, i32* %branchsel.finally1
store i32 0, i32* %branchsel.finally
br label %finally4
endif7: ; preds = %endif3
%13 = bitcast %test.S* %s3 to i8* ; [#uses = 1 type = i8*]
call void @llvm.memset.p0i8.i64(i8* %13, i8 0, i64 4, i32 1, i1 false)
%14 = getelementptr { i64, { i64, i8* }* }* %args, i32 0, i32 0 ; [#uses = 1 type = i64*]
%.len10 = load i64* %14 ; [#uses = 1 type = i64]
%15 = icmp ule i64 %.len10, 3 ; [#uses = 1 type = i1]
br i1 %15, label %if11, label %endif12
finally8: ; preds = %try.success, %finally9
store i32 333, i32* %foo
%16 = load i32* %branchsel.finally8 ; [#uses = 1 type = i32]
switch i32 %16, label %finally4 [
i32 1, label %try.success13
]
finally9: ; preds = %endif12, %if11
call void @_D4test1S6__dtorMFZv(%test.S* %s3)
%17 = load i32* %branchsel.finally9 ; [#uses = 1 type = i32]
switch i32 %17, label %finally8 [
i32 1, label %try.success
]
if11: ; preds = %endif7
store i32 3, i32* %return.slot
store i32 0, i32* %branchsel.finally9
store i32 0, i32* %branchsel.finally8
store i32 0, i32* %branchsel.finally4
store i32 0, i32* %branchsel.finally1
store i32 0, i32* %branchsel.finally
br label %finally9
endif12: ; preds = %endif7
%18 = bitcast %test.S* %s4 to i8* ; [#uses = 1 type = i8*]
call void @llvm.memset.p0i8.i64(i8* %18, i8 0, i64 4, i32 1, i1 false)
call void @_D4test1S6__dtorMFZv(%test.S* %s4)
store i32 1, i32* %branchsel.finally9
br label %finally9
try.success: ; preds = %finally9
store i32 1, i32* %branchsel.finally8
br label %finally8
try.success13: ; preds = %finally8
store i32 1, i32* %branchsel.finally4
br label %finally4
try.success14: ; preds = %finally4
store i32 1, i32* %branchsel.finally1
br label %finally1
try.success15: ; preds = %finally1
store i32 1, i32* %branchsel.finally
br label %finally
try.success16: ; preds = %endif
%19 = load i32* %foo ; [#uses = 0 type = i32]
%20 = load i32* %foo ; [#uses = 0 type = i32]
%21 = load i32* %return.slot ; [#uses = 1 type = i32]
ret i32 %21
}
|
Ah, the poor optimization results are probably because the code contains a number of strange loops in untaken branches. |
Fixed the bug and edited the above IR. Now optimizes nicely to define i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) #0 {
%unnamed.fca.0.extract = extractvalue { i64, { i64, i8* }* } %unnamed, 0 ; [#uses = 2 type = i64]
%1 = icmp ult i64 %unnamed.fca.0.extract, 2 ; [#uses = 1 type = i1]
br i1 %1, label %finally1, label %endif3
finally1: ; preds = %0
ret i32 1
endif3: ; preds = %0
%2 = icmp ult i64 %unnamed.fca.0.extract, 3 ; [#uses = 1 type = i1]
%.26 = select i1 %2, i32 2, i32 3 ; [#uses = 1 type = i32]
ret i32 %.26
} On x86, this now gives ( test`_Dmain:
test[0x100000bf0] <+0>: cmp rdi, 0x1
test[0x100000bf4] <+4>: ja 0x100000bfc ; <+12>
test[0x100000bf6] <+6>: mov eax, 0x1
test[0x100000bfb] <+11>: ret
test[0x100000bfc] <+12>: cmp rdi, 0x2
test[0x100000c00] <+16>: seta al
test[0x100000c03] <+19>: movzx eax, al
test[0x100000c06] <+22>: or eax, 0x2
test[0x100000c09] <+25>: ret while DMD produces ( test`_Dmain:
test[0x100000cf4] <+0>: push rbp
test[0x100000cf5] <+1>: mov rbp, rsp
test[0x100000cf8] <+4>: sub rsp, 0x70
test[0x100000cfc] <+8>: push rbx
test[0x100000cfd] <+9>: push r12
test[0x100000cff] <+11>: push r13
test[0x100000d01] <+13>: push r14
test[0x100000d03] <+15>: push r15
test[0x100000d05] <+17>: mov rbx, rdi
test[0x100000d08] <+20>: mov rcx, rsi
test[0x100000d0b] <+23>: xor eax, eax
test[0x100000d0d] <+25>: mov dword ptr [rbp - 0x8], eax
test[0x100000d10] <+28>: mov dword ptr [rbp - 0x18], eax
test[0x100000d13] <+31>: cmp rbx, 0x1
test[0x100000d17] <+35>: ja 0x100000d56 ; <+98>
test[0x100000d19] <+37>: mov eax, 0x1
test[0x100000d1e] <+42>: mov qword ptr [rbp - 0x28], rax
test[0x100000d22] <+46>: sub rsp, 0x8
test[0x100000d26] <+50>: call 0x100000e8b ; <+407>
test[0x100000d2b] <+55>: add rsp, 0x8
test[0x100000d2f] <+59>: mov rax, qword ptr [rbp - 0x28]
test[0x100000d33] <+63>: mov qword ptr [rbp - 0x20], rax
test[0x100000d37] <+67>: sub rsp, 0x8
test[0x100000d3b] <+71>: call 0x100000e9f ; <+427>
test[0x100000d40] <+76>: add rsp, 0x8
test[0x100000d44] <+80>: mov rax, qword ptr [rbp - 0x20]
test[0x100000d48] <+84>: pop r15
test[0x100000d4a] <+86>: pop r14
test[0x100000d4c] <+88>: pop r13
test[0x100000d4e] <+90>: pop r12
test[0x100000d50] <+92>: pop rbx
test[0x100000d51] <+93>: mov rsp, rbp
test[0x100000d54] <+96>: pop rbp
test[0x100000d55] <+97>: ret
test[0x100000d56] <+98>: mov dword ptr [rbp - 0x14], eax
test[0x100000d59] <+101>: cmp rbx, 0x2
test[0x100000d5d] <+105>: ja 0x100000db1 ; <+189>
test[0x100000d5f] <+107>: mov eax, 0x2
test[0x100000d64] <+112>: mov qword ptr [rbp - 0x40], rax
test[0x100000d68] <+116>: sub rsp, 0x8
test[0x100000d6c] <+120>: call 0x100000e77 ; <+387>
test[0x100000d71] <+125>: add rsp, 0x8
test[0x100000d75] <+129>: mov rax, qword ptr [rbp - 0x40]
test[0x100000d79] <+133>: mov qword ptr [rbp - 0x38], rax
test[0x100000d7d] <+137>: sub rsp, 0x8
test[0x100000d81] <+141>: call 0x100000e8b ; <+407>
test[0x100000d86] <+146>: add rsp, 0x8
test[0x100000d8a] <+150>: mov rax, qword ptr [rbp - 0x38]
test[0x100000d8e] <+154>: mov qword ptr [rbp - 0x30], rax
test[0x100000d92] <+158>: sub rsp, 0x8
test[0x100000d96] <+162>: call 0x100000e9f ; <+427>
test[0x100000d9b] <+167>: add rsp, 0x8
test[0x100000d9f] <+171>: mov rax, qword ptr [rbp - 0x30]
test[0x100000da3] <+175>: pop r15
test[0x100000da5] <+177>: pop r14
test[0x100000da7] <+179>: pop r13
test[0x100000da9] <+181>: pop r12
test[0x100000dab] <+183>: pop rbx
test[0x100000dac] <+184>: mov rsp, rbp
test[0x100000daf] <+187>: pop rbp
test[0x100000db0] <+188>: ret
test[0x100000db1] <+189>: mov dword ptr [rbp - 0x10], eax
test[0x100000db4] <+192>: cmp rbx, 0x3
test[0x100000db8] <+196>: ja 0x100000e36 ; <+322>
test[0x100000dba] <+198>: mov eax, 0x3
test[0x100000dbf] <+203>: mov qword ptr [rbp - 0x68], rax
test[0x100000dc3] <+207>: sub rsp, 0x8
test[0x100000dc7] <+211>: call 0x100000e4c ; <+344>
test[0x100000dcc] <+216>: add rsp, 0x8
test[0x100000dd0] <+220>: mov rax, qword ptr [rbp - 0x68]
test[0x100000dd4] <+224>: mov qword ptr [rbp - 0x60], rax
test[0x100000dd8] <+228>: sub rsp, 0x8
test[0x100000ddc] <+232>: call 0x100000e60 ; <+364>
test[0x100000de1] <+237>: add rsp, 0x8
test[0x100000de5] <+241>: mov rax, qword ptr [rbp - 0x60]
test[0x100000de9] <+245>: mov qword ptr [rbp - 0x58], rax
test[0x100000ded] <+249>: sub rsp, 0x8
test[0x100000df1] <+253>: call 0x100000e77 ; <+387>
test[0x100000df6] <+258>: add rsp, 0x8
test[0x100000dfa] <+262>: mov rax, qword ptr [rbp - 0x58]
test[0x100000dfe] <+266>: mov qword ptr [rbp - 0x50], rax
test[0x100000e02] <+270>: sub rsp, 0x8
test[0x100000e06] <+274>: call 0x100000e8b ; <+407>
test[0x100000e0b] <+279>: add rsp, 0x8
test[0x100000e0f] <+283>: mov rax, qword ptr [rbp - 0x50]
test[0x100000e13] <+287>: mov qword ptr [rbp - 0x48], rax
test[0x100000e17] <+291>: sub rsp, 0x8
test[0x100000e1b] <+295>: call 0x100000e9f ; <+427>
test[0x100000e20] <+300>: add rsp, 0x8
test[0x100000e24] <+304>: mov rax, qword ptr [rbp - 0x48]
test[0x100000e28] <+308>: pop r15
test[0x100000e2a] <+310>: pop r14
test[0x100000e2c] <+312>: pop r13
test[0x100000e2e] <+314>: pop r12
test[0x100000e30] <+316>: pop rbx
test[0x100000e31] <+317>: mov rsp, rbp
test[0x100000e34] <+320>: pop rbp
test[0x100000e35] <+321>: ret
test[0x100000e36] <+322>: mov dword ptr [rbp - 0xc], eax
test[0x100000e39] <+325>: lea rcx, qword ptr [rbp - 0xc]
test[0x100000e3d] <+329>: sub rsp, 0x8
test[0x100000e41] <+333>: call 0x100000e4c ; <+344>
test[0x100000e46] <+338>: add rsp, 0x8
test[0x100000e4a] <+342>: jmp 0x100000e51 ; <+349>
test[0x100000e4c] <+344>: lea rcx, qword ptr [rbp - 0x10]
test[0x100000e50] <+348>: ret
test[0x100000e51] <+349>: sub rsp, 0x8
test[0x100000e55] <+353>: call 0x100000e60 ; <+364>
test[0x100000e5a] <+358>: add rsp, 0x8
test[0x100000e5e] <+362>: jmp 0x100000e68 ; <+372>
test[0x100000e60] <+364>: mov dword ptr [rbp - 0x8], 0x14d
test[0x100000e67] <+371>: ret
test[0x100000e68] <+372>: sub rsp, 0x8
test[0x100000e6c] <+376>: call 0x100000e77 ; <+387>
test[0x100000e71] <+381>: add rsp, 0x8
test[0x100000e75] <+385>: jmp 0x100000e7c ; <+392>
test[0x100000e77] <+387>: lea rdx, qword ptr [rbp - 0x14]
test[0x100000e7b] <+391>: ret
test[0x100000e7c] <+392>: sub rsp, 0x8
test[0x100000e80] <+396>: call 0x100000e8b ; <+407>
test[0x100000e85] <+401>: add rsp, 0x8
test[0x100000e89] <+405>: jmp 0x100000e90 ; <+412>
test[0x100000e8b] <+407>: lea rbx, qword ptr [rbp - 0x18]
test[0x100000e8f] <+411>: ret
test[0x100000e90] <+412>: sub rsp, 0x8
test[0x100000e94] <+416>: call 0x100000e9f ; <+427>
test[0x100000e99] <+421>: add rsp, 0x8
test[0x100000e9d] <+425>: jmp 0x100000ead ; <+441>
test[0x100000e9f] <+427>: cmp dword ptr [rbp - 0x8], 0x0
test[0x100000ea3] <+431>: jne 0x100000eac ; <+440>
test[0x100000ea5] <+433>: mov dword ptr [rbp - 0x8], 0x29a
test[0x100000eac] <+440>: ret
test[0x100000ead] <+441>: mov eax, dword ptr [rbp - 0x8]
test[0x100000eb0] <+444>: pop r15
test[0x100000eb2] <+446>: pop r14
test[0x100000eb4] <+448>: pop r13
test[0x100000eb6] <+450>: pop r12
test[0x100000eb8] <+452>: pop rbx
test[0x100000eb9] <+453>: mov rsp, rbp
test[0x100000ebc] <+456>: pop rbp
test[0x100000ebd] <+457>: ret |
See #1030. It would be awesome if somebody could have a look at the remaining issues (the change is not as huge as it looks; one commit simplifies |
Awesome! Will check it out when I find some time... |
The main problem should be fixed now, by making sure that every piece of catch/finally code (including implicit things like dtor calls) is only emitted once. Please open separate follow-up issues for any specific remaining issues. One thing we might want to do in particular in the future is to fine-tune the cleanup emission such that e.g. trivial dtor calls get inlined directly into the blocks before the branch (mainly to make debug builds faster, LLVM optimizes things quite well in release mode). |
For example, the std.uni debug mode unit tests take several times longer to compile now (probably more than 10x). 70% of the time is spent in
llvm::MachineModuleInfo::getOrCreateLandingPadInfo(llvm::MachineBasicBlock*)
.The text was updated successfully, but these errors were encountered: