-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tailcall regression with compiled F# #40864
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Presume codegen for now, and marking as 5.0 pending investigation. cc @dotnet/jit-contrib |
Going to take a preliminary look while waiting for other intermittent bugs to repro. @dotnet/jit-contrib feel free to jump in and take this one. |
Managed stack bactrace at point of overflow...
|
Something goes wrong in the new tail call helpers and we end up not reusing stack frames. |
I cannot quite see what segment of the stack is looping in your backtrace. However, it looks like the instantiating stub generated and used by the JIT here also needs to do a tailcall. Since it is leaving a normal frame on the stack the tailcalling mechanism is probably not kicking in. We might be missing tests for this scenario, though it is unclear to me, there are many generic tests around here, and some of them should involve instantiating stubs: runtime/src/tests/JIT/Directed/tailcall/more_tailcalls.cs Lines 616 to 626 in f35d747
I'm not sure how this is normally handled, to me it seems like tail-calling a pointer produced by |
The instantiating stubs for trivial signatures (e.g. where all arguments fit into registers) use regular tail calls. The instantiating stubs for more complex signatures do not tailcall. It is what's causing the problem. |
Here's the top of the native stack, frame 0a is one of those complex instantiating stubs
and the code for this stub: 00007ffa`166511b0 55 push rbp
00007ffa`166511b1 4883ec40 sub rsp, 40h
00007ffa`166511b5 488d6c2440 lea rbp, [rsp+40h]
00007ffa`166511ba 488bc2 mov rax, rdx
00007ffa`166511bd 4d8bd0 mov r10, r8
00007ffa`166511c0 4c894c2420 mov qword ptr [rsp+20h], r9
00007ffa`166511c5 488b5530 mov rdx, qword ptr [rbp+30h]
00007ffa`166511c9 4889542428 mov qword ptr [rsp+28h], rdx
00007ffa`166511ce 488b5538 mov rdx, qword ptr [rbp+38h]
00007ffa`166511d2 4889542430 mov qword ptr [rsp+30h], rdx
00007ffa`166511d7 488b5540 mov rdx, qword ptr [rbp+40h]
00007ffa`166511db 4889542438 mov qword ptr [rsp+38h], rdx
00007ffa`166511e0 488bd1 mov rdx, rcx
00007ffa`166511e3 4c8bc0 mov r8, rax
00007ffa`166511e6 4d8bca mov r9, r10
00007ffa`166511e9 48b9c8578e16fa7f0000 mov rcx, 7FFA168E57C8h
00007ffa`166511f3 48b850e56416fa7f0000 mov rax, offset CLRStub[MethodDescPrestub]@7ffa1664e550 (00007ffa`1664e550)
00007ffa`166511fd ffd0 call rax
00007ffa`166511ff 90 nop
00007ffa`16651200 488d6500 lea rsp, [rbp]
00007ffa`16651204 5d pop rbp
00007ffa`16651205 c3 ret |
So is the fix it as simple as emitting |
It would introduce very significant performance regression for scenarios that happen to go through these instantiating stubs and do not care about tailcalls. I think that the best way to solve this is by manually "inlining" the instantiating stub into the tailcall helper: Load address of the actual target + instantiating stub arg in the first tail call helper and match it in the second tail call helper. IIRC, @jakobbotsch tried this, but there were some difficulties with doing that. Another way to solve this would be to create a tailcalling instantiating stubs (ie slow instantiating stubs that have the tail prefix before the call), but that feels too wide-spread. |
Related comment: dotnet/coreclr#26418 (comment) However doing this will still not fix the underlying issue, the symptom being that you cannot robustly tailcall a pointer returned by |
Delegates do not make any guarantees about when or whether they internally perform tailcalls. |
Does the F# example use virtual methods? I think we just need to fix the non-virtual method case for this. |
What about explicit
No, but the same problem could occur with virtual methods I presume. Is it feasible to detect when emitting the instantiating stub whether the target method contains a |
[edit: had posted the backtrace for the version without IL edits; so updated that part] Need to dig through what is happening in F# but I think it is non-virtual. Here's a simple C# repro that overflows... (modified so calls F->G and G->F have a using System;
class X
{
static int F<T>(int a, int r, T c, Span<int> d)
{
int result = r;
for (int i = a; i < d.Length; i++)
{
result += d[i];
}
return G(c, a, r, d, result);
}
static int G<T>(T c, int a, int r, Span<int> d, int result)
{
if (a == d.Length) return result;
else return F(a + 1, result, c, d);
}
static int Main()
{
int[] a = new int[1_000_000];
a[99] = 1;
var s = new Span<int>(a);
int r = F<string>(0, 0, "string", a);
Console.WriteLine($"r = {r}");
return r;
}
} results in
|
Is it possible to create special instantiating stubs with tail prefixes that are only used when the caller dispatches a tail-prefixed call? |
We could, but I think it is better to deal with it locally by "inlining" the instantiating stub into the tailcall helper: #40864 (comment) . It should be local change, with less risk, and it will have better performance compared to introduring new type of instantiating stubs . New type of instantiating stubs would be a new type system entity. Introducing new type system entities tends to be non-trivial because of they typically have to handled in many places. |
Here is this approach: jakobbotsch@b1b668a |
If the target function has explicit tailcalls out of it we must also ensure that we explicitly tailcall into it when generating instantiating stubs so that the instantiating stub does not leave a frame on the stack. Fix dotnet#40864
I agree it will fix this problem. The potential problems with this change:
|
Also, it may have GC hole for collectible types (not 100% about it). Is the hidden argument going to be reported to GC correctly to keep the collectible type alive when we are inside the tailcall dispatcher? |
IMO, if the target method has explicit tail calls out of it, then the user has already declared that tailcalls do matter and the instantiating stubs should conservative about it. Otherwise it seems we won't be able to support higher order code relying on tailcalls. Again, not sure how academic this problem is.
Ok, I see. I don't know how to deal with this.
Hmm, I'm not too familiar with how this works. But I think you are right -- if there is a previous dispatcher then the instantiating stub will return immediately, and since the hidden arg is just a native sized integer in the signature of the |
Would it be feasible for the complex instantiating stubs to detect at runtime if tail calls are pending (say looking at the TLS data) and then have two call sites, one that tail calls (presumably, slowly), the other that does a normal call. Stubs would get bigger even if we never need the tail call part. Might also have some false positives where we tail call when not needed, but perhaps not too frequent? Also seems like we could leverage tail call stress (likely with the change to always use helpers for tail calls) plus gc stress to get more gc coverage on these paths. |
My guess is that we cannot detect that in general. Consider an example where the target function can do a fast tailcall to an instantiating stub for itself (which has one fewer arguments), while the instantiating stub cannot do a fast tailcall to the target function. Then there won't be any indication in the TLS that we are currently doing tailcalls. |
@erozenfeld I take it you may not get around to fixing this. Should we reassign? cc @JulieLeeMSFT |
Yes, at this point I think we should reassign this. |
#41206 implements approach described in #40864 (comment) |
Can I assign this to @jkotas? |
I reassigned this issue to @jkotas since he already has a fix. |
This change skips instantiating stubs for direct tailcalls and instead passes the inst argument directly to the target method. Fixes dotnet#40864
* Fix tailcall regression with compiled F# This change skips instantiating stubs for direct tailcalls and instead passes the inst argument directly to the target method. Fixes #40864 * Add regression test Co-authored-by: Jakob Botsch Nielsen <[email protected]>
This change skips instantiating stubs for direct tailcalls and instead passes the inst argument directly to the target method. Fixes #40864
* Fix tailcall regression with compiled F# This change skips instantiating stubs for direct tailcalls and instead passes the inst argument directly to the target method. Fixes #40864 * Add regression test Co-authored-by: Jan Kotas <[email protected]> Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@jkotas , we are seeing this failure again,the repro is the same as before Machine: Pre-verification: On a clean machine install latest VS with the following workloads:
At this point the test should pass. Bug repro:
N.b., you may need to alter global.json with the following: The stack overflow happens when running against the net5.0 runtime from : 5.0.200-preview.21075.10 |
Could you please open a new issue on this and extract the repro into a small program if possible? I am not able to build using the steps above. I am getting errors like |
Sigh!!! ... I will try |
Machine:
Latest Windows 10 x64
Pre-verification:
https://github.com/dotnet/fsharp
and reset to commit 1c1b5ac7eacbbfd79e7277982e15178cecee20b4.\Build.cmd -c Release
dotnet test tests\fsharp\FSharpSuite.Tests.fsproj --no-restore --no-build --filter LargeRecordDoesNotStackOverflow -f netcoreapp3.1 -c Release -v n
At this point the test should pass.
Bug repro:
C:\Program Files\dotnet\sdk\3.1.401
C:\Program Files\dotnet\shared\Microsoft.NETCore.App\3.1.7
dotnet --list-sdks
anddotnet --list-runtimes
N.b., you may need to alter
global.json
with the following:Result:
Stack overflow
Possibly related to #40581
The text was updated successfully, but these errors were encountered: