Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tail call optimization for call + nop + ret #58622

Closed
Szer opened this issue Sep 3, 2021 · 17 comments
Closed

Allow tail call optimization for call + nop + ret #58622

Szer opened this issue Sep 3, 2021 · 17 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@Szer
Copy link

Szer commented Sep 3, 2021

At the moment JIT could eliminate tail calls with call+ret or callvirt+ret patterns.

There is a known problem in F# compiler where it emits nop instruction in between call and ret instructions even in Release mode
dotnet/fsharp#6026

Unfortunately @dsyme decided not to fix that issue.

My proposal is to add 2 more patterns for JIT to eliminate tail calls: call+nop+ret and callvirt+nop+ret.

Small repro from issue above which still fails with StackOverflow Exception

module Main

type Foo = struct
    end

and [<AbstractClass>] Bar() =    
    abstract bar: foo: byref<Foo> -> unit

let bar = 
    { new Bar() with
        override x.bar (foo: byref<Foo>) =
            x.bar(&foo) } // tail call here

[<EntryPoint>]
let main _ =    
    let mutable a = new Foo()
    bar.bar(&a)
    0

Workaround

ildasm
string.Replace(nop, "")
ilasm
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Sep 3, 2021
@ghost
Copy link

ghost commented Sep 3, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

At the moment JIT could optimize tail calls with call+ret or callvirt+ret patterns.

There is a known problem in F# compiler where it emits nop instruction in between call and ret instructions even in Release mode
dotnet/fsharp#6026

Unfortunately @dsyme decided not to fix that issue.

My proposal is to add 2 more patterns for JIT to eliminate tail calls: call+nop+ret and callvirt+nop+ret.

Small repro from issue above which still fails with StackOverflow Exception

module Main

type Foo = struct
    end

and [<AbstractClass>] Bar() =    
    abstract bar: foo: byref<Foo> -> unit

let bar = 
    { new Bar() with
        override x.bar (foo: byref<Foo>) =
            x.bar(&foo) } // tail call here

[<EntryPoint>]
let main _ =    
    let mutable a = new Foo()
    bar.bar(&a)
    0

Workaround

ildasm
string.Replace(nop, "")
ilasm
Author: Szer
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added this to the 7.0.0 milestone Sep 3, 2021
@EgorBo EgorBo self-assigned this Sep 3, 2021
@EgorBo
Copy link
Member

EgorBo commented Sep 3, 2021

@jakobbotsch I'll take a look if you don't mind since I'm currently inspecting tailcall related sources

@jakobbotsch
Copy link
Member

@EgorBo Sure, but I'm not really sure I understand the issue. @Szer is F# generating tail./call/nop/ret? From my reading it sounds like F# is generating call/nop/ret (without tail. prefix), but only in debug in which case we are not going to do any tailcall optimization in any case.

@EgorBo
Copy link
Member

EgorBo commented Sep 3, 2021

@jakobbotsch I'm seeing an actual nop here for Release as well (in ILDasm) sharplab doesn't render it

@Szer
Copy link
Author

Szer commented Sep 3, 2021

@jakobbotsch F# generates nop between call and ret even in Release (you should check ildasm output or just run the example I provided, it will fail with SO) and that blocks JIT to eliminate that tailcall.

If you remove that nop example above should run without SO

@jakobbotsch
Copy link
Member

If you remove that nop example above should run without SO

It's still only guaranteed if it is tail. prefixed. If the nop is removed it might work on x64 and ARM64, but it will still fail on x86 or ARM32.
It sounds fine to me to do opportunistic tailcalls even with nops, but I don't think it reliably solves the issue.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 3, 2021
@Szer
Copy link
Author

Szer commented Sep 3, 2021

@jakobbotsch I do realize it is UB, but if I'm forced to do ildasm+ilasm fixes, Im totally ok with that :)

@jakobbotsch
Copy link
Member

jakobbotsch commented Sep 3, 2021

It seems to me that the underlying issue is that F#'s analysis of when to insert tail. prefix is too conservative. It does not insert it here because of the byref parameter, even though one can statically do the analysis to determine that the byref never points locally (the JIT is doing an approximation of this analysis, which is why it is able to do the tailcall as an optimization). Ideally the F# compiler should do a similar analysis that Roslyn does to determine whether byrefs are allowed to escape or not. cc @dotnet/fsharp

I'm ok with the fix in #58626, but just note that you are depending on an optimization that the JIT might or might not do depending on any number of factors (platform/architecture, passing large structs as other arguments to the call, using structs fields in the caller function etc.).

@dsyme
Copy link

dsyme commented Sep 3, 2021

Some notes:

  • The F# compiler doesn't emit a tail. in the linked example because a byref is being passed - and it never emits tail. in such situations.

  • So from the point of view of the F# compiler this is according to spec. For example, if we had "non-tailcall warnings" in F#, we'd always emit a warning at this point, regardless of what the JIT does on a specific platform.

  • Programmers should never rely on tailcalls being takenwhen it is not guaranteed across platforms. I would advise against a slippery slope where tailcalls become platform-specific. As @jakobbotsch describes this is not a reliable guarantee across platforms.

@sver I personally think it's better to take this request through the F# language design process rather than making adhoc changes to the JIT, though the changes aren't wrong in themselves (the JIT can take extra tailcalls if it wants).

For example, now byref analysis is stronger in the F# compiler due to the Spans work, I can see the argument for guaranteeing a tailcall in the situation above. However that is a change to the F# language specification and should go through https://github.com/fsharp/fslang-suggestions and https://github.com/fsharp/fslang-design RFC, because we will need a precise specification at the language level.

@dsyme
Copy link

dsyme commented Sep 3, 2021

@jakobbotsch You read my mind

@EgorBo
Copy link
Member

EgorBo commented Sep 3, 2021

I would advise against a slippery slope where tailcalls become platform-specific.

I think it's inevitable due to ABI differences and happens even today, e.g. #58522 (comment)

@dsyme
Copy link

dsyme commented Sep 3, 2021

I think it's inevitable due to ABI differences and happens even today, e.g. #58522 (comment)

My understanding is this:

  • In practice .NET implements the "tail." instruction sufficiently uniformly on all platforms to allow a language like F# to give a tailcall guarantee in specific circumstances. This subset of tailcalls is consistently implemented across all platforms and under strong test. If this guarantee gets violated for future platforms, we will know about it. There may be some existing violations but I'm not personally aware of any.

  • .NET JITs also take some additional tailcalls (regardless of whether "tail." is present, and at different optimizations levels, and platform-specific). I'm not personally interested in those as I don't think programmers should rely on them and they aren't part of the language-level guarantee. I'd encourage .NET to only add more of these if the perf results are really important (otherwise they seep through the programming model, with programmers picking up hidden, subtle platform dependencies on these being taken)

@Szer
Copy link
Author

Szer commented Sep 3, 2021

So in general, this tail call elimination was present in .NET Framework age even before .NET Core, so this particular change is not something new, but rather make same optimization wider and more consistent.

As a neat side effect it fixes some inevitable nops in F# code :)

And I'm fully agree that it should be fixed on language level as well, making F# even more intelligent.
That byref example is a small repro from my production code written with Hopac which heavily uses both byrefs, inheritance, structs and mutable under the hood.
Bug is still there :) Hopac/Hopac#192

I was always under the impression that relying on JIT ASM output is "here will be dragons"-zone of coding and I'm pretty sure that anyone who uses such optimisations or rely on them knows what they are doing.

@jakobbotsch
Copy link
Member

My understanding is this:

That's correct.

So in general, this tail call elimination was present in .NET Framework age even before .NET Core, so this particular change is not something new, but rather make same optimization wider and more consistent.

Depending on the platform that might be true. The reverse is also true -- we do more tailcalls in .NET core than in .NET framework. But we don't guarantee that optimizations done in .NET framework are also done in .NET core and vice versa.

@dsyme
Copy link

dsyme commented Sep 3, 2021

So in general, this tail call elimination was present in .NET Framework age even before .NET Core, so this particular change is not something new, but rather make same optimization wider and more consistent.

My position is this: the spec of when tailcalls are taken is fundamentally part of the programming model, not its implementation. This means that each time "extra" tailcall optimizations are added, expanded or adjusted the programming model changes, and guidance must be rewritten, and programmers re-educated. And changing the tailcall spec should, for example, ideally always be associated with a /langversion switch.

So from my perspective, changing or expanding when ".NET decides to take a tailcall even when tail" is not present" is not a valuable thing. Every change to when .NET implicitly takes extra tailcalls messes with the programming model that the user knows and relies upon. The only valid reason to add these is performance.

As a neat side effect it fixes some inevitable nops in F# code :)

The nops are by design - F# is deciding not to emit a tailcall so is putting down a nop for debugging after the call return.

relying on JIT ASM output is "here will be dragons"-zone of coding and I'm pretty sure that anyone who uses such optimisations or rely on them knows what they are doing.

Programmers implicitly rely on these characteristics all the time without realising it.

@EgorBo
Copy link
Member

EgorBo commented Sep 3, 2021

If I am not mistaken, Roslyn will never emit nops after calls in Release so the change I made in JIT is purely F# only and since Don has a strong (completely understandable) opinion against it I guess I will just close the PR. Thanks for bringing this issue to our attention, it's something we should indeed keep in mind (potential issues with platform-dependent tailcalls).

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 3, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 3, 2021
@Szer
Copy link
Author

Szer commented Sep 3, 2021

Okay, it seems that that adding ad-hoc tailcalls in JIT is not what people want, so I'm closing the issue.

@Szer Szer closed this as completed Sep 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Oct 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants