Allow tail call optimization for call + nop + ret #58622

Szer · 2021-09-03T12:36:33Z

At the moment JIT could eliminate tail calls with call+ret or callvirt+ret patterns.

There is a known problem in F# compiler where it emits nop instruction in between call and ret instructions even in Release mode
dotnet/fsharp#6026

Unfortunately @dsyme decided not to fix that issue.

My proposal is to add 2 more patterns for JIT to eliminate tail calls: call+nop+ret and callvirt+nop+ret.

Small repro from issue above which still fails with StackOverflow Exception

module Main

type Foo = struct
    end

and [<AbstractClass>] Bar() =    
    abstract bar: foo: byref<Foo> -> unit

let bar = 
    { new Bar() with
        override x.bar (foo: byref<Foo>) =
            x.bar(&foo) } // tail call here

[<EntryPoint>]
let main _ =    
    let mutable a = new Foo()
    bar.bar(&a)
    0

Workaround

ildasm
string.Replace(nop, "")
ilasm

The text was updated successfully, but these errors were encountered:

ghost · 2021-09-03T12:36:38Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

At the moment JIT could optimize tail calls with call+ret or callvirt+ret patterns.

There is a known problem in F# compiler where it emits nop instruction in between call and ret instructions even in Release mode
dotnet/fsharp#6026

Unfortunately @dsyme decided not to fix that issue.

My proposal is to add 2 more patterns for JIT to eliminate tail calls: call+nop+ret and callvirt+nop+ret.

Small repro from issue above which still fails with StackOverflow Exception

module Main

type Foo = struct
    end

and [<AbstractClass>] Bar() =    
    abstract bar: foo: byref<Foo> -> unit

let bar = 
    { new Bar() with
        override x.bar (foo: byref<Foo>) =
            x.bar(&foo) } // tail call here

[<EntryPoint>]
let main _ =    
    let mutable a = new Foo()
    bar.bar(&a)
    0

Workaround

ildasm
string.Replace(nop, "")
ilasm

Author:	Szer
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

EgorBo · 2021-09-03T12:43:01Z

@jakobbotsch I'll take a look if you don't mind since I'm currently inspecting tailcall related sources

jakobbotsch · 2021-09-03T12:51:26Z

@EgorBo Sure, but I'm not really sure I understand the issue. @Szer is F# generating tail./call/nop/ret? From my reading it sounds like F# is generating call/nop/ret (without tail. prefix), but only in debug in which case we are not going to do any tailcall optimization in any case.

EgorBo · 2021-09-03T12:52:24Z

@jakobbotsch I'm seeing an actual nop here for Release as well (in ILDasm) sharplab doesn't render it

Szer · 2021-09-03T12:54:48Z

@jakobbotsch F# generates nop between call and ret even in Release (you should check ildasm output or just run the example I provided, it will fail with SO) and that blocks JIT to eliminate that tailcall.

If you remove that nop example above should run without SO

jakobbotsch · 2021-09-03T12:58:30Z

If you remove that nop example above should run without SO

It's still only guaranteed if it is tail. prefixed. If the nop is removed it might work on x64 and ARM64, but it will still fail on x86 or ARM32.
It sounds fine to me to do opportunistic tailcalls even with nops, but I don't think it reliably solves the issue.

Szer · 2021-09-03T13:12:16Z

@jakobbotsch I do realize it is UB, but if I'm forced to do ildasm+ilasm fixes, Im totally ok with that :)

jakobbotsch · 2021-09-03T13:20:59Z

It seems to me that the underlying issue is that F#'s analysis of when to insert tail. prefix is too conservative. It does not insert it here because of the byref parameter, even though one can statically do the analysis to determine that the byref never points locally (the JIT is doing an approximation of this analysis, which is why it is able to do the tailcall as an optimization). Ideally the F# compiler should do a similar analysis that Roslyn does to determine whether byrefs are allowed to escape or not. cc @dotnet/fsharp

I'm ok with the fix in #58626, but just note that you are depending on an optimization that the JIT might or might not do depending on any number of factors (platform/architecture, passing large structs as other arguments to the call, using structs fields in the caller function etc.).

dsyme · 2021-09-03T13:22:09Z

Some notes:

The F# compiler doesn't emit a tail. in the linked example because a byref is being passed - and it never emits tail. in such situations.
So from the point of view of the F# compiler this is according to spec. For example, if we had "non-tailcall warnings" in F#, we'd always emit a warning at this point, regardless of what the JIT does on a specific platform.
Programmers should never rely on tailcalls being takenwhen it is not guaranteed across platforms. I would advise against a slippery slope where tailcalls become platform-specific. As @jakobbotsch describes this is not a reliable guarantee across platforms.

@sver I personally think it's better to take this request through the F# language design process rather than making adhoc changes to the JIT, though the changes aren't wrong in themselves (the JIT can take extra tailcalls if it wants).

For example, now byref analysis is stronger in the F# compiler due to the Spans work, I can see the argument for guaranteeing a tailcall in the situation above. However that is a change to the F# language specification and should go through https://github.com/fsharp/fslang-suggestions and https://github.com/fsharp/fslang-design RFC, because we will need a precise specification at the language level.

dsyme · 2021-09-03T13:23:04Z

@jakobbotsch You read my mind

EgorBo · 2021-09-03T13:44:25Z

I would advise against a slippery slope where tailcalls become platform-specific.

I think it's inevitable due to ABI differences and happens even today, e.g. #58522 (comment)

dsyme · 2021-09-03T14:02:54Z

I think it's inevitable due to ABI differences and happens even today, e.g. #58522 (comment)

My understanding is this:

In practice .NET implements the "tail." instruction sufficiently uniformly on all platforms to allow a language like F# to give a tailcall guarantee in specific circumstances. This subset of tailcalls is consistently implemented across all platforms and under strong test. If this guarantee gets violated for future platforms, we will know about it. There may be some existing violations but I'm not personally aware of any.
.NET JITs also take some additional tailcalls (regardless of whether "tail." is present, and at different optimizations levels, and platform-specific). I'm not personally interested in those as I don't think programmers should rely on them and they aren't part of the language-level guarantee. I'd encourage .NET to only add more of these if the perf results are really important (otherwise they seep through the programming model, with programmers picking up hidden, subtle platform dependencies on these being taken)

Szer · 2021-09-03T14:19:57Z

So in general, this tail call elimination was present in .NET Framework age even before .NET Core, so this particular change is not something new, but rather make same optimization wider and more consistent.

As a neat side effect it fixes some inevitable nops in F# code :)

And I'm fully agree that it should be fixed on language level as well, making F# even more intelligent.
That byref example is a small repro from my production code written with Hopac which heavily uses both byrefs, inheritance, structs and mutable under the hood.
Bug is still there :) Hopac/Hopac#192

I was always under the impression that relying on JIT ASM output is "here will be dragons"-zone of coding and I'm pretty sure that anyone who uses such optimisations or rely on them knows what they are doing.

jakobbotsch · 2021-09-03T15:10:51Z

My understanding is this:

That's correct.

So in general, this tail call elimination was present in .NET Framework age even before .NET Core, so this particular change is not something new, but rather make same optimization wider and more consistent.

Depending on the platform that might be true. The reverse is also true -- we do more tailcalls in .NET core than in .NET framework. But we don't guarantee that optimizations done in .NET framework are also done in .NET core and vice versa.

dsyme · 2021-09-03T15:13:22Z

So in general, this tail call elimination was present in .NET Framework age even before .NET Core, so this particular change is not something new, but rather make same optimization wider and more consistent.

My position is this: the spec of when tailcalls are taken is fundamentally part of the programming model, not its implementation. This means that each time "extra" tailcall optimizations are added, expanded or adjusted the programming model changes, and guidance must be rewritten, and programmers re-educated. And changing the tailcall spec should, for example, ideally always be associated with a /langversion switch.

So from my perspective, changing or expanding when ".NET decides to take a tailcall even when tail" is not present" is not a valuable thing. Every change to when .NET implicitly takes extra tailcalls messes with the programming model that the user knows and relies upon. The only valid reason to add these is performance.

As a neat side effect it fixes some inevitable nops in F# code :)

The nops are by design - F# is deciding not to emit a tailcall so is putting down a nop for debugging after the call return.

relying on JIT ASM output is "here will be dragons"-zone of coding and I'm pretty sure that anyone who uses such optimisations or rely on them knows what they are doing.

Programmers implicitly rely on these characteristics all the time without realising it.

EgorBo · 2021-09-03T15:24:52Z

If I am not mistaken, Roslyn will never emit nops after calls in Release so the change I made in JIT is purely F# only and since Don has a strong (completely understandable) opinion against it I guess I will just close the PR. Thanks for bringing this issue to our attention, it's something we should indeed keep in mind (potential issues with platform-dependent tailcalls).

Szer · 2021-09-03T15:49:33Z

Okay, it seems that that adding ad-hoc tailcalls in JIT is not what people want, so I'm closing the issue.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Sep 3, 2021

EgorBo added this to the 7.0.0 milestone Sep 3, 2021

EgorBo self-assigned this Sep 3, 2021

EgorBo mentioned this issue Sep 3, 2021

Ignore NOPs in impIsTailCallILPattern #58626

Closed

ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 3, 2021

jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 3, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 3, 2021

Szer closed this as completed Sep 3, 2021

ghost locked as resolved and limited conversation to collaborators Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow tail call optimization for call + nop + ret #58622

Allow tail call optimization for call + nop + ret #58622

Szer commented Sep 3, 2021 •

edited

Loading

ghost commented Sep 3, 2021

Workaround

EgorBo commented Sep 3, 2021

jakobbotsch commented Sep 3, 2021

EgorBo commented Sep 3, 2021

Szer commented Sep 3, 2021 •

edited

Loading

jakobbotsch commented Sep 3, 2021

Szer commented Sep 3, 2021

jakobbotsch commented Sep 3, 2021 •

edited

Loading

dsyme commented Sep 3, 2021

dsyme commented Sep 3, 2021

EgorBo commented Sep 3, 2021

dsyme commented Sep 3, 2021

Szer commented Sep 3, 2021

jakobbotsch commented Sep 3, 2021

dsyme commented Sep 3, 2021 •

edited

Loading

EgorBo commented Sep 3, 2021

Szer commented Sep 3, 2021

Allow tail call optimization for call + nop + ret #58622

Allow tail call optimization for call + nop + ret #58622

Comments

Szer commented Sep 3, 2021 • edited Loading

Workaround

ghost commented Sep 3, 2021

Workaround

EgorBo commented Sep 3, 2021

jakobbotsch commented Sep 3, 2021

EgorBo commented Sep 3, 2021

Szer commented Sep 3, 2021 • edited Loading

jakobbotsch commented Sep 3, 2021

Szer commented Sep 3, 2021

jakobbotsch commented Sep 3, 2021 • edited Loading

dsyme commented Sep 3, 2021

dsyme commented Sep 3, 2021

EgorBo commented Sep 3, 2021

dsyme commented Sep 3, 2021

Szer commented Sep 3, 2021

jakobbotsch commented Sep 3, 2021

dsyme commented Sep 3, 2021 • edited Loading

EgorBo commented Sep 3, 2021

Szer commented Sep 3, 2021

Szer commented Sep 3, 2021 •

edited

Loading

Szer commented Sep 3, 2021 •

edited

Loading

jakobbotsch commented Sep 3, 2021 •

edited

Loading

dsyme commented Sep 3, 2021 •

edited

Loading