Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate .tailcall improvements, identify further optimisations #12196

Closed
vzarytovskii opened this issue Sep 27, 2021 · 20 comments
Closed

Validate .tailcall improvements, identify further optimisations #12196

vzarytovskii opened this issue Sep 27, 2021 · 20 comments
Assignees
Labels
Bug Impact-High (Internal MS Team use only) Describes an issue with extreme impact on existing code. Theme-Performance

Comments

@vzarytovskii
Copy link
Member

vzarytovskii commented Sep 27, 2021

This issue : #11907

Has a lot of great information about the issue resolved by: dotnet/runtime#56669

Validate and test the changes determine whether additional optimisations are necessary.

@vzarytovskii vzarytovskii self-assigned this Sep 27, 2021
@KevinRansom KevinRansom changed the title Validate runtime r2r changes Validate .tailcall improvements, identify further optimisations Sep 27, 2021
@KevinRansom KevinRansom added the Impact-High (Internal MS Team use only) Describes an issue with extreme impact on existing code. label Oct 13, 2021
@vzarytovskii
Copy link
Member Author

vzarytovskii commented Nov 11, 2021

Some tests and comparisons of desktop compiler vs the one shipped in 6.0.100 and builds from main (.NET 7):

TL;DR: .NET7 is better than .NET6, but worse than full framework

VS2022 - FSC.exe

TIME:  0.4 Delta:  0.3 Mem:  95 G0:   3 G1:  3 G2:  3 [Import mscorlib and FSharp.Core.dll]
TIME:  0.4 Delta:  0.0 Mem:  99 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.4 Delta:  0.0 Mem:  99 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.4 Delta:  0.1 Mem: 112 G0:   0 G1:  0 G2:  0 [Typecheck]
TIME:  0.4 Delta:  0.0 Mem: 112 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.4 Delta:  0.0 Mem: 112 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.4 Delta:  0.0 Mem: 112 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.4 Delta:  0.0 Mem: 112 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.5 Delta:  0.0 Mem: 122 G0:   0 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.5 Delta:  0.0 Mem: 123 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.5 Delta:  0.0 Mem: 123 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.5 Delta:  0.0 Mem: 124 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.5 Delta:  0.0 Mem: 126 G0:   0 G1:  0 G2:  0 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.500 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 3
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.000 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.000 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.000 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.000 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.000 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.000 (total)        0.000 (delta) - Signing Image
TIME:  0.5 Delta:  0.0 Mem: 128 G0:   0 G1:  0 G2:  0 [Write .NET Binary]
STATS: #ByteArrayFile = 3, #MemoryMappedFileOpen = 24, #MemoryMappedFileClosed = 0, #RawMemoryFile = 0, #WeakByteArrayFile = 0

VS2022 - fscAnyCpu.exe

TIME:  0.4 Delta:  0.3 Mem: 122 G0:   2 G1:  2 G2:  2 [Import mscorlib and FSharp.Core.dll]
TIME:  0.4 Delta:  0.0 Mem: 126 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.4 Delta:  0.0 Mem: 127 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.6 Delta:  0.2 Mem: 146 G0:   1 G1:  0 G2:  0 [Typecheck]
TIME:  0.6 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.6 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.6 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.6 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.6 Delta:  0.0 Mem: 152 G0:   0 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.7 Delta:  0.0 Mem: 155 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.7 Delta:  0.0 Mem: 155 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.7 Delta:  0.0 Mem: 156 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.7 Delta:  0.0 Mem: 159 G0:   0 G1:  0 G2:  0 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.672 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 3
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.000 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.000 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.000 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.000 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.000 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.000 (total)        0.000 (delta) - Signing Image
TIME:  0.7 Delta:  0.0 Mem: 161 G0:   0 G1:  0 G2:  0 [Write .NET Binary]
STATS: #ByteArrayFile = 3, #MemoryMappedFileOpen = 24, #MemoryMappedFileClosed = 0, #RawMemoryFile = 0, #WeakByteArrayFile = 0

.NET 6.0.100

TIME:  0.5 Delta:  0.4 Mem: 113 G0:   8 G1:  4 G2:  3 [Import mscorlib and FSharp.Core.dll]
TIME:  0.6 Delta:  0.1 Mem: 122 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.6 Delta:  0.0 Mem: 122 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  1.0 Delta:  0.4 Mem: 145 G0:   3 G1:  1 G2:  0 [Typecheck]
TIME:  1.0 Delta:  0.0 Mem: 145 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  1.0 Delta:  0.0 Mem: 145 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  1.0 Delta:  0.0 Mem: 145 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  1.0 Delta:  0.0 Mem: 145 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  1.1 Delta:  0.1 Mem: 156 G0:   2 G1:  1 G2:  0 [Encode Interface Data]
TIME:  1.2 Delta:  0.1 Mem: 158 G0:   1 G1:  0 G2:  0 [Optimizations]
TIME:  1.2 Delta:  0.0 Mem: 158 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  1.3 Delta:  0.0 Mem: 158 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  1.3 Delta:  0.1 Mem: 159 G0:   0 G1:  0 G2:  0 [TAST -> IL]
ilwrite: TIME      0.000 (total)        1.344 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.016 (total)        0.016 (delta) - Module Generation Pass 2
ilwrite: TIME      0.047 (total)        0.031 (delta) - Module Generation Pass 3
ilwrite: TIME      0.047 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.047 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.047 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.062 (total)        0.016 (delta) - Layout Header of Tables
ilwrite: TIME      0.062 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.062 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.062 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.062 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.062 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.062 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.062 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.062 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.062 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.062 (total)        0.000 (delta) - Signing Image
TIME:  1.4 Delta:  0.1 Mem: 161 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

.NET 7.0.100-alpha.1.21558.2

TIME:  0.4 Delta:  0.3 Mem: 113 G0:   8 G1:  4 G2:  3 [Import mscorlib and FSharp.Core.dll]
TIME:  0.5 Delta:  0.1 Mem: 125 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.5 Delta:  0.0 Mem: 125 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.8 Delta:  0.3 Mem: 147 G0:   3 G1:  1 G2:  0 [Typecheck]
TIME:  0.8 Delta:  0.0 Mem: 147 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.8 Delta:  0.0 Mem: 147 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.8 Delta:  0.0 Mem: 147 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.8 Delta:  0.0 Mem: 147 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.8 Delta:  0.1 Mem: 158 G0:   2 G1:  1 G2:  0 [Encode Interface Data]
TIME:  0.9 Delta:  0.1 Mem: 159 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.9 Delta:  0.0 Mem: 159 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.9 Delta:  0.0 Mem: 160 G0:   1 G1:  0 G2:  0 [Encoding OptData]
TIME:  1.0 Delta:  0.1 Mem: 161 G0:   0 G1:  0 G2:  0 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.969 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.016 (total)        0.016 (delta) - Module Generation Pass 3
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.016 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.016 (total)        0.000 (delta) - Signing Image
TIME:  1.0 Delta:  0.0 Mem: 162 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

cc @KevinRansom @TIHan

@dsyme
Copy link
Contributor

dsyme commented Nov 11, 2021

So faster but not as fast. Could you compare with fscAnyCpu.exe (64-bit) to make sure it's not that

@dsyme
Copy link
Contributor

dsyme commented Nov 11, 2021

Also how many times did you repeat? Thanks

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Nov 11, 2021

Also how many times did you repeat? Thanks

I've been running each 100 times.

@vzarytovskii
Copy link
Member Author

So faster but not as fast. Could you compare with fscAnyCpu.exe (64-bit) to make sure it's not that

Added fscAnyCpu too

@dsyme
Copy link
Contributor

dsyme commented Nov 11, 2021

I've been running each roughly 100 times.

That should do it 😂

Added fscAnyCpu too

OK thanks

I'm not sure it's easy to get, but I'd be curious of startup time comparison for some large C# projects that are equally comparable. .NET Framework has a long tradition of really focusing on optimized startup with optimized binary layout for native mscorlib etc. It's really sophisticated. I'm not entirely sure if all these are present in .NET Core SDK distributions though they might be and more.

It could be that we realistically expect some slightly slower startup regardless of tailcalls.

@jkotas might have more insight.

Anyway I think the 0.7 --> 1.0 is looking fairly acceptable. Larger compilations should also see only a ~0.3sec slowdown hopefully, so the costs are only paid once.

@jakobbotsch
Copy link
Member

What is under test here? IIUC, a simple small F# program being compiled with a .NET Framework version of the F# compiler, and .NET 6/7 versions of the (same source code?) F# compiler?

If you give me a short guide on how to measure these things I can try to check how much time we are spending on JIT and if there are other optimizations we can do to help.

@jkotas
Copy link
Member

jkotas commented Nov 11, 2021

I'm not entirely sure if all these are present in .NET Core SDK distributions though they might be and more.

These optimizations are not present in .NET Core. These optimizations in .NET Framework make the AOT compiled fragile: The AOT binary can be used only if all its dependencies match exactly between the AOT time and R2R time. There is global NGen services that regenerates all binaries as needed (e.g. after servicing). This model was unworkable for .NET Core for multiple reasons.

.NET Core uses version-resilient AOT compiler code. You can read more about it in https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/readytorun-overview.md. The version-resilient AOT compiled is less efficient, but we compensate for it in other ways (e.g. using tiered compilation).

@vzarytovskii
Copy link
Member Author

What is under test here? IIUC, a simple small F# program being compiled with a .NET Framework version of the F# compiler, and .NET 6/7 versions of the (same source code?) F# compiler?

Yep, that's pretty much what we try to test here. We compile a simple F# "Hello World" program and measure compiler start time.

If you give me a short guide on how to measure these things I can try to check how much time we are spending on JIT and if there are other optimizations we can do to help.

Let me try to come up some simple instructions how to test it.

@En3Tho
Copy link
Contributor

En3Tho commented Nov 11, 2021

@vzarytovskii Is this on Win only? Any info about Linux/MacOS?. As far as I understand it should help Unix family unconditionally or am I mistaken?

@vzarytovskii
Copy link
Member Author

@vzarytovskii Is this on Win only? Any info about Linux/MacOS. As far as I understand it should help Unix family unconditionally or am I mistaken?

This is Windows.

@vzarytovskii
Copy link
Member Author

If you give me a short guide on how to measure these things I can try to check how much time we are spending on JIT and if there are other optimizations we can do to help.

To just run the compiler locally, you can invoke the dll directly, e.g. dotnet "C:\Program Files\dotnet\sdk\7.0.100-alpha.1.21558.2\FSharp\fsc.dll" .\Program.fs

If you want to test on local build, just clone dotnet/fsharp, invoke Build.cmd -c Release, and you can find fsc.dll in the artifacts folder.

Let me try to come up with something easier to run.

@dsyme
Copy link
Contributor

dsyme commented Nov 11, 2021

If you want to test on local build, just clone dotnet/fsharp, invoke Build.cmd -c Release, and you can find fsc.dll in the artifacts folder.

Is that cross-gen'd?

@vzarytovskii
Copy link
Member Author

vzarytovskii commented Nov 11, 2021

If you want to test on local build, just clone dotnet/fsharp, invoke Build.cmd -c Release, and you can find fsc.dll in the artifacts folder.

Is that cross-gen'd?

Ah, yeah, it's not, forgot to mention. This will just build the compiler. I guess you can just add needed options to src/fsharp/fsc/fsc.fspoj and publish it after initial build with script.

@vzarytovskii
Copy link
Member Author

This is what I get if I rebuild compiler with collected PGO data:

TIME:  0.5 Delta:  0.3 Mem: 112 G0:   8 G1:  4 G2:  3 [Import mscorlib and FSharp.Core.dll]
TIME:  0.5 Delta:  0.1 Mem: 119 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.5 Delta:  0.0 Mem: 119 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.9 Delta:  0.4 Mem: 142 G0:   3 G1:  1 G2:  0 [Typecheck]
TIME:  0.9 Delta:  0.0 Mem: 142 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.9 Delta:  0.0 Mem: 142 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.9 Delta:  0.0 Mem: 142 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.9 Delta:  0.0 Mem: 142 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  1.0 Delta:  0.1 Mem: 153 G0:   2 G1:  1 G2:  0 [Encode Interface Data]
TIME:  1.1 Delta:  0.1 Mem: 155 G0:   1 G1:  0 G2:  0 [Optimizations]
TIME:  1.1 Delta:  0.0 Mem: 155 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  1.1 Delta:  0.0 Mem: 155 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  1.2 Delta:  0.1 Mem: 156 G0:   0 G1:  0 G2:  0 [TAST -> IL]
ilwrite: TIME      0.000 (total)        1.203 (delta) - Write Started
ilwrite: TIME      0.016 (total)        0.016 (delta) - Module Generation Preparation
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.031 (total)        0.016 (delta) - Module Generation Pass 3
ilwrite: TIME      0.031 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.031 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.031 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.031 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.031 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.031 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.031 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.031 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.031 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.031 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.031 (total)        0.000 (delta) - Signing Image
TIME:  1.2 Delta:  0.0 Mem: 158 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

@jakobbotsch
Copy link
Member

Here are some experimental results on my machine. All the times are "best of 5".

VS2022 fscAnyCpu.exe

TIME:  0.2 Delta:  0.2 Mem: 121 G0:   4 G1:  4 G2:  4 [Import mscorlib and FSharp.Core.dll]
TIME:  0.2 Delta:  0.0 Mem: 126 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.2 Delta:  0.0 Mem: 126 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.3 Delta:  0.0 Mem: 138 G0:   0 G1:  0 G2:  0 [Typecheck]
TIME:  0.3 Delta:  0.0 Mem: 138 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.3 Delta:  0.0 Mem: 138 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.3 Delta:  0.0 Mem: 138 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.3 Delta:  0.0 Mem: 138 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.3 Delta:  0.0 Mem: 155 G0:   0 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.3 Delta:  0.0 Mem: 157 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.3 Delta:  0.0 Mem: 157 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.3 Delta:  0.0 Mem: 158 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.3 Delta:  0.0 Mem: 162 G0:   0 G1:  0 G2:  0 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.281 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 3
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.000 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.000 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.000 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.000 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.000 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.000 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.000 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.000 (total)        0.000 (delta) - Signing Image
TIME:  0.3 Delta:  0.0 Mem: 165 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

.NET 6.0.100

TIME:  0.3 Delta:  0.2 Mem: 126 G0:   4 G1:  3 G2:  2 [Import mscorlib and FSharp.Core.dll]
TIME:  0.3 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.3 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.6 Delta:  0.2 Mem: 149 G0:   0 G1:  0 G2:  0 [Typecheck]
TIME:  0.6 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.6 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.6 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.6 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.6 Delta:  0.0 Mem: 156 G0:   1 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.7 Delta:  0.1 Mem: 159 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.7 Delta:  0.0 Mem: 159 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.7 Delta:  0.0 Mem: 160 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.8 Delta:  0.1 Mem: 162 G0:   1 G1:  1 G2:  1 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.797 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.031 (total)        0.031 (delta) - Module Generation Pass 3
ilwrite: TIME      0.031 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.031 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.031 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.031 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.031 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.031 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.031 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.031 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.031 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.031 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.031 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.031 (total)        0.000 (delta) - Signing Image
TIME:  0.8 Delta:  0.0 Mem: 164 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

.NET 7.0.100-alpha.1.21558.2

TIME:  0.2 Delta:  0.1 Mem: 125 G0:   4 G1:  3 G2:  2 [Import mscorlib and FSharp.Core.dll]
TIME:  0.3 Delta:  0.1 Mem: 137 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.3 Delta:  0.0 Mem: 137 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.4 Delta:  0.1 Mem: 149 G0:   0 G1:  0 G2:  0 [Typecheck]
TIME:  0.4 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.4 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.4 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.4 Delta:  0.0 Mem: 149 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.5 Delta:  0.0 Mem: 157 G0:   1 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.5 Delta:  0.0 Mem: 159 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.5 Delta:  0.0 Mem: 159 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.5 Delta:  0.0 Mem: 160 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.6 Delta:  0.1 Mem: 162 G0:   1 G1:  1 G2:  1 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.625 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.016 (total)        0.016 (delta) - Module Generation Pass 2
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 3
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.016 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.016 (total)        0.000 (delta) - Signing Image
TIME:  0.6 Delta:  0.0 Mem: 164 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

Composite mode

Composite mode with compilebubblegenerics is a little better, I updated global.json and nuget.config to use the .NET 7 SDK and published with

dotnet publish .\src\fsharp\fsc\fsc.fsproj -c Release --self-contained -r win-x64 /p:ProtoTargetFramework=net7.0 -f net7.0 /p:PublishReadyToRun=true /p:PublishReadyToRunComposite=true /p:PublishReadyToRunCrossgen2ExtraArgs=--compilebubblegenerics

and it gives:

TIME:  0.2 Delta:  0.1 Mem: 121 G0:   4 G1:  3 G2:  2 [Import mscorlib and FSharp.Core.dll]
TIME:  0.2 Delta:  0.0 Mem: 134 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.2 Delta:  0.0 Mem: 134 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.3 Delta:  0.1 Mem: 146 G0:   0 G1:  0 G2:  0 [Typecheck]
TIME:  0.3 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.3 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.3 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.3 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.4 Delta:  0.0 Mem: 154 G0:   1 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.4 Delta:  0.0 Mem: 156 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.4 Delta:  0.0 Mem: 156 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.4 Delta:  0.0 Mem: 157 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.5 Delta:  0.1 Mem: 160 G0:   1 G1:  1 G2:  1 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.500 (delta) - Write Started
ilwrite: TIME      0.016 (total)        0.016 (delta) - Module Generation Preparation
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 3
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.016 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.016 (total)        0.000 (delta) - Signing Image
TIME:  0.5 Delta:  0.0 Mem: 162 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

But it looks like we are still spending ~27% of the time jitting:
image

Here are some of the ones taking the longest time:

JitTime MSec MethodName IL Size Native Size
18.958 FSharp.Compiler.CheckExpressions.TcExprUndelayed(class TcFileState class OverallTy class TcEnv class UnscopedTyparEnv class FSharp.Compiler.Syntax.SynExpr) 8288 20165
17.329 FSharp.Compiler.CheckExpressions.TcItemThen(class TcFileState class OverallTy class TcEnv class UnscopedTyparEnv class Microsoft.FSharp.Collections.FSharpList`1 class Item value class FSharp.Compiler.Text.Range class Microsoft.FSharp.Collections.FSharpList`1 class AfterResolution class Microsoft.FSharp.Collections.FSharpList`1) 7102 21520
16.898 FSharp.Compiler.Lexer.token(class LexArgs bool class Internal.Utilities.Text.Lexing.LexBuffer`1) 7155 17963
8.731 FSharp.Compiler.IlxGen.GenExprAux(class cenv class CodeGenBuffer class IlxGenEnv class EmitDebugPointState class Expr class sequel) 5801 11019
8.351 FSharp.Compiler.AugmentWithHashCompare.CheckAugmentationAttribs(bool class TcGlobals class ImportMap class Entity) 8395 8098
7.329 FSharp.Compiler.PostTypeCheckSemanticChecks.CheckExpr(class cenv class env class Expr class PermitByRefExpr) 4303 9200
7.113 FSharp.Compiler.AbstractIL.ILBinaryWriter.GenILMethodBody(!!0 class cenv class ILTypeWriterEnv class ILMethodBody) 1128 6338
6.526 FSharp.Compiler.IlxGen.GenApp(class cenv class CodeGenBuffer class IlxGenEnv class Expr class TType class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Collections.FSharpList`1 value class FSharp.Compiler.Text.Range class sequel) 3168 7263
6.033 FSharp.Compiler.IlxGen.GenBindingAfterDebugPoint(class cenv class CodeGenBuffer class IlxGenEnv class EmitDebugPointState class Binding bool class Microsoft.FSharp.Core.FSharpOption`1) 2791 8933
5.949 FSharp.Compiler.Lexer.singleQuoteString(class FSharp.Compiler.IO.ByteBuffer class LexerStringFinisher value class FSharp.Compiler.Text.Range value class LexerStringKind class LexArgs bool class Internal.Utilities.Text.Lexing.LexBuffer`1) 2013 9965
5.763 FSharp.Compiler.ConstraintSolver.SolveTypeSubsumesType(class ConstraintSolverEnv int32 value class FSharp.Compiler.Text.Range class OptionalTrace class Microsoft.FSharp.Core.FSharpOption`1<class System.Tuple`2> class TType class TType) 3044 7462
5.303 FSharp.Compiler.TypedTreePickle.p_op(class TOp class WriterState) 1896 7710
5.191 FSharp.Compiler.TypedTreePickle.unpickleObjWithDanglingCcus(class System.String class ILScopeRef class Microsoft.FSharp.Core.FSharpOption`1 class Microsoft.FSharp.Core.FSharpFunc`2<class ReaderState !!0> value class FSharp.Compiler.IO.ReadOnlyByteMemory) 1096 5977
4.658 FSharp.Compiler.TypedTreeOps.remapExpr(class TcGlobals class ValCopyFlag class Remap class Expr) 2141 5234
4.409 FSharp.Compiler.ConstraintSolver.SolveTypeEqualsType(class ConstraintSolverEnv int32 value class FSharp.Compiler.Text.Range class OptionalTrace class Microsoft.FSharp.Core.FSharpOption`1<class System.Tuple`2> class TType class TType) 2011 4695
4.261 FSharp.Compiler.PatternMatchCompilation.BindProjectionPattern@1304(class TcGlobals class DisplayEnv class ImportMap class Microsoft.FSharp.Core.FSharpFunc`2<class ValRef class Microsoft.FSharp.Core.FSharpFunc`2<class ValUseFlag class Microsoft.FSharp.Core.FSharpFunc`2<class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Core.FSharpFunc`2<value class FSharp.Compiler.Text.Range class System.Tuple`2>>>> class InfoReader value class FSharp.Compiler.Text.Range value class FSharp.Compiler.Text.Range bool class ActionOnFailure class Microsoft.FSharp.Collections.FSharpList`1 class Val class Microsoft.FSharp.Core.FSharpOption`1 class TType class MatchBuilder class TypedMatchClause[] int32 class Microsoft.FSharp.Core.FSharpFunc`2 class Microsoft.FSharp.Core.FSharpFunc`2<class Pattern class Microsoft.FSharp.Core.FSharpOption`1> class Microsoft.FSharp.Core.FSharpRef`1<class Microsoft.FSharp.Core.FSharpOption`1> class Active class System.Tuple`2<class Microsoft.FSharp.Collections.FSharpList`1 value class ValMap`1>) 1110 4394
4.24 System.SpanHelpers.IndexOf(wchar& wchar int32) 760 465
4.168 FSharp.Compiler.Optimizer.OptimizeExpr(class cenv class IncrementalOptimizationEnv class Expr) 1734 3865
3.999 FSharp.Compiler.TypedTreePickle.pickleObjWithDanglingCcus(bool class System.String class TcGlobals class CcuThunk class Microsoft.FSharp.Core.FSharpFunc`2<!!0 class Microsoft.FSharp.Core.FSharpFunc`2> !!0) 1440 6838
3.995 FSharp.Compiler.TypedTreeOps.rewriteExprStructure(class ExprRewritingEnv class Expr) 1130 4060
3.938 FSharp.Compiler.Optimizer.OptimizeExprOp(class cenv class IncrementalOptimizationEnv class TOp class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Collections.FSharpList`1 value class FSharp.Compiler.Text.Range) 2730 5186
3.541 [email protected](bool class TType class WriterState) 742 2806
3.532 FSharp.Compiler.PostTypeCheckSemanticChecks.CheckEntityDefn(class cenv class env class Entity) 1509 4091
2.695 FSharp.Compiler.AbstractIL.ILBinaryReader.openPEFileReader(class System.String class BinaryFile !!0 bool) 1431 3193
2.547 FSharp.Compiler.CheckExpressions.TcExprThen(class TcFileState class OverallTy class TcEnv class UnscopedTyparEnv bool class FSharp.Compiler.Syntax.SynExpr class Microsoft.FSharp.Collections.FSharpList`1) 1285 2963
2.501 System.Runtime.CompilerServices.CastHelpers.StelemRef(class System.Array int32 class System.Object) 86 91
2.173 FSharp.Compiler.AbstractIL.ILBinaryReader.seekReadIndexedRowsByInterface(int32 bool class ISeekReadIndexedRowReader`3<!!0 !!1 !!2>) 401 1390
2.162 FSharp.Compiler.CheckExpressions.TcApplicationThen(class TcFileState class OverallTy class TcEnv class UnscopedTyparEnv value class FSharp.Compiler.Text.Range class Microsoft.FSharp.Core.FSharpOption`1 class ApplicableExpr class TType class FSharp.Compiler.Syntax.SynExpr value class FSharp.Compiler.Syntax.ExprAtomicFlag bool class Microsoft.FSharp.Collections.FSharpList`1) 1356 2787
2.119 FSharp.Compiler.FindUnsolved.accExpr(class cenv class env class Expr) 1236 2036
2.116 FSharp.Compiler.AutoBox.DecideExpr(class cenv class Microsoft.FSharp.Core.FSharpFunc`2<class Internal.Utilities.Collections.Tagged.Set`2<class Val class System.Collections.Generic.IComparer`1> class Microsoft.FSharp.Core.FSharpFunc`2<class Expr class Internal.Utilities.Collections.Tagged.Set`2<class Val class System.Collections.Generic.IComparer`1>>> class Microsoft.FSharp.Core.FSharpFunc`2<class Internal.Utilities.Collections.Tagged.Set`2<class Val class System.Collections.Generic.IComparer`1> class Microsoft.FSharp.Core.FSharpFunc`2<class Expr class Internal.Utilities.Collections.Tagged.Set`2<class Val class System.Collections.Generic.IComparer`1>>> class Internal.Utilities.Collections.Tagged.Set`2<class Val class System.Collections.Generic.IComparer`1> class Expr) 403 1249
1.874 FSharp.Compiler.CheckExpressions.TcDelayed(class TcFileState class OverallTy class TcEnv class UnscopedTyparEnv value class FSharp.Compiler.Text.Range class ApplicableExpr class TType value class FSharp.Compiler.Syntax.ExprAtomicFlag class Microsoft.FSharp.Collections.FSharpList`1) 615 2100
1.861 System.Text.Unicode.Utf16Utility.GetPointerToFirstInvalidChar(wchar* int32 int64& int32&) 1004 384
1.853 FSharp.Compiler.IlxGen.AddBindingsForModuleDef(class Microsoft.FSharp.Core.FSharpFunc`2<class CompileLocation class Microsoft.FSharp.Core.FSharpFunc`2<class Val class Microsoft.FSharp.Core.FSharpFunc`2<!!0 !!0>>> class CompileLocation !!0 class ModuleOrNamespaceExpr) 179 793
1.842 FSharp.Compiler.IlxGen.GenTypeAux(class ImportMap value class FSharp.Compiler.Text.Range class TypeReprEnv class VoidNotOK class PtrsOK class TType) 513 1452
1.839 FSharp.Compiler.CheckExpressions.TcConstExpr(class TcFileState class OverallTy class TcEnv value class FSharp.Compiler.Text.Range class UnscopedTyparEnv class FSharp.Compiler.Syntax.SynConst) 439 1758
1.807 FSharp.Compiler.Parser.checkEndOfFileError(class LexerContinuation) 835 1551
1.735 Microsoft.FSharp.Collections.SeqModule.groupByRefType(class Microsoft.FSharp.Core.FSharpFunc`2<!!0 !!1> class System.Collections.Generic.IEnumerable`1<!!0>) 66 874
1.518 FSharp.Compiler.AttributeChecking.TryBindMethInfoAttribute(class TcGlobals value class FSharp.Compiler.Text.Range class BuiltinAttribInfo class MethInfo class Microsoft.FSharp.Core.FSharpFunc`2<class System.Tuple`2<class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Collections.FSharpList`1<class System.Tuple`4>> class Microsoft.FSharp.Core.FSharpOption`1<!!0>> class Microsoft.FSharp.Core.FSharpFunc`2<class Attrib class Microsoft.FSharp.Core.FSharpOption`1<!!0>> class Microsoft.FSharp.Core.FSharpFunc`2<class System.Tuple`2<class Microsoft.FSharp.Collections.FSharpList`1<class Microsoft.FSharp.Core.FSharpOption`1> class Microsoft.FSharp.Collections.FSharpList`1<class System.Tuple`2<class System.String class Microsoft.FSharp.Core.FSharpOption`1>>> class Microsoft.FSharp.Core.FSharpOption`1<!!0>>) 43 436
1.49 Microsoft.FSharp.Collections.ListModule.FoldBack(class Microsoft.FSharp.Core.FSharpFunc`2<!!0 class Microsoft.FSharp.Core.FSharpFunc`2<!!1 !!1>> class Microsoft.FSharp.Collections.FSharpList`1<!!0> !!1) 305 1462
1.383 FSharp.Compiler.TypedTreePickle.p_hole2() 25 415
1.322 FSharp.Compiler.AbstractIL.ILBinaryWriter.GetMethodRefAsMemberRefIdx(class cenv class ILTypeWriterEnv class ILTypeWriterEnv class ILMethodRef) 84 495
1.311 FSharp.Compiler.IlxGen.GenNamedTyAppAux(class ImportMap value class FSharp.Compiler.Text.Range class TypeReprEnv class PtrsOK class EntityRef class Microsoft.FSharp.Collections.FSharpList`1) 360 992
1.263 System.Collections.Generic.Dictionary`2[Microsoft.FSharp.Core.CompilerServices.RuntimeHelpers+StructBox`1[System.__Canon] System.Decimal].TryInsert(!0 !1 value class System.Collections.Generic.InsertionBehavior) 675 1046
1.262 FSharp.Compiler.IlxGen.HashRangeSorted(class System.Collections.Generic.IDictionary`2<!!0 class System.Tuple`2<int32 !!1>>) 113 577
1.249 FSharp.Compiler.AbstractIL.ILBinaryReader.mkCacheInt32(bool !!0 !!1 !!2) 32 359
1.247 System.Collections.Generic.GenericArraySortHelper`2[System.UInt16 System.Int32].GreaterThan(!0& !0&) 875 3372
1.245 Microsoft.FSharp.Collections.ListModule.foldBack2$cont@327(class Microsoft.FSharp.Core.FSharpFunc`2<!!0 class Microsoft.FSharp.Core.FSharpFunc`2<!!1 class Microsoft.FSharp.Core.FSharpFunc`2<!!2 !!2>>> class Microsoft.FSharp.Collections.FSharpList`1<!!0> class Microsoft.FSharp.Collections.FSharpList`1<!!1> !!2 class Microsoft.FSharp.Collections.FSharpList`1<!!1> class Microsoft.FSharp.Collections.FSharpList`1<!!0> !!1 !!0 class Microsoft.FSharp.Core.Unit) 507 720
1.241 Microsoft.FSharp.Core.FSharpFunc`2[System.__Canon System.__Canon].InvokeFast(class Microsoft.FSharp.Core.FSharpFunc`2<!0 class Microsoft.FSharp.Core.FSharpFunc`2<!1 class Microsoft.FSharp.Core.FSharpFunc`2<!!0 class Microsoft.FSharp.Core.FSharpFunc`2<!!1 class Microsoft.FSharp.Core.FSharpFunc`2<!!2 !!3>>>>> !0 !1 !!0 !!1 !!2) 158 711
1.214 System.SpanHelpers.Fill(!!0& unsigned int !!0) 870 248
1.208 FSharp.Compiler.IlxGen.GenModuleBinding(class cenv class CodeGenBuffer class FSharp.Compiler.Syntax.QualifiedNameOfFile class System.Collections.Generic.List`1<class Microsoft.FSharp.Core.FSharpFunc`2<class ILFieldSpec class Microsoft.FSharp.Core.FSharpFunc`2<class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Core.FSharpFunc`2<class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Core.Unit>>>> class IlxGenEnv value class FSharp.Compiler.Text.Range class ModuleOrNamespaceBinding) 467 1753
1.188 System.String.Replace(wchar wchar) 287 307
1.185 FSharp.Compiler.TypedTreePickle.p_hole() 25 415
1.162 Microsoft.FSharp.Core.FSharpFunc`2[System.__Canon FSharp.Compiler.ParseHelpers+LexerStringKind].InvokeFast(class Microsoft.FSharp.Core.FSharpFunc`2<!0 class Microsoft.FSharp.Core.FSharpFunc`2<!1 class Microsoft.FSharp.Core.FSharpFunc`2<!!0 class Microsoft.FSharp.Core.FSharpFunc`2<!!1 !!2>>>> !0 !1 !!0 !!1) 112 517
1.153 Microsoft.FSharp.Core.LanguagePrimitives+HashCompare.GenericCompare(class GenericComparer class System.Object class System.Object) 1092 2338
1.127 FSharp.Compiler.PostTypeCheckSemanticChecks.CheckValRef(class cenv class env class ValRef value class FSharp.Compiler.Text.Range class PermitByRefExpr) 734 1534
1.124 FSharp.Compiler.PatternMatchCompilation.CompilePattern(class TcGlobals class DisplayEnv class ImportMap class Microsoft.FSharp.Core.FSharpFunc`2<class ValRef class Microsoft.FSharp.Core.FSharpFunc`2<class ValUseFlag class Microsoft.FSharp.Core.FSharpFunc`2<class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Core.FSharpFunc`2<value class FSharp.Compiler.Text.Range class System.Tuple`2>>>> class InfoReader value class FSharp.Compiler.Text.Range value class FSharp.Compiler.Text.Range bool class ActionOnFailure class Val class Microsoft.FSharp.Collections.FSharpList`1 class Microsoft.FSharp.Core.FSharpOption`1 class Microsoft.FSharp.Collections.FSharpList`1 class TType class TType) 169 699
1.123 FSharp.Compiler.Optimizer.TryEliminateBinding(class cenv !!0 class Binding class Expr !!1) 2774 8001
1.122 FSharp.Compiler.IlxGen.AddBindingsForLocalModuleType(class Microsoft.FSharp.Core.FSharpFunc`2<class CompileLocation class Microsoft.FSharp.Core.FSharpFunc`2<class Val class Microsoft.FSharp.Core.FSharpFunc`2<!!0 !!0>>> class CompileLocation !!0 class ModuleOrNamespaceType) 40 311

My guess is that there is a correlation between 'complicated big function' and 'requiring tailcall helper', which means that even though there are few functions that need tailcall helpers the ones that do need them will often take long to JIT. To test this I set <Tailcalls>false</Tailcalls> in fsc.fsproj, FSharp.Compiler.Service.fsproj and FSharp.Core.fsproj, and then after publishing using the above the results are:

TIME:  0.2 Delta:  0.1 Mem: 122 G0:   4 G1:  3 G2:  2 [Import mscorlib and FSharp.Core.dll]
TIME:  0.2 Delta:  0.0 Mem: 125 G0:   0 G1:  0 G2:  0 [Parse inputs]
TIME:  0.2 Delta:  0.0 Mem: 125 G0:   0 G1:  0 G2:  0 [Import non-system references]
TIME:  0.2 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Typecheck]
TIME:  0.2 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Typechecked]
TIME:  0.2 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Write Interface File]
TIME:  0.2 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Write XML document signatures]
TIME:  0.2 Delta:  0.0 Mem: 136 G0:   0 G1:  0 G2:  0 [Write XML docs]
TIME:  0.2 Delta:  0.0 Mem: 144 G0:   1 G1:  0 G2:  0 [Encode Interface Data]
TIME:  0.2 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Optimizations]
TIME:  0.2 Delta:  0.0 Mem: 146 G0:   0 G1:  0 G2:  0 [Ending Optimizations]
TIME:  0.2 Delta:  0.0 Mem: 147 G0:   0 G1:  0 G2:  0 [Encoding OptData]
TIME:  0.3 Delta:  0.1 Mem: 149 G0:   1 G1:  1 G2:  1 [TAST -> IL]
ilwrite: TIME      0.000 (total)        0.312 (delta) - Write Started
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Preparation
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 1
ilwrite: TIME      0.000 (total)        0.000 (delta) - Module Generation Pass 2
ilwrite: TIME      0.016 (total)        0.016 (delta) - Module Generation Pass 3
ilwrite: TIME      0.016 (total)        0.000 (delta) - Module Generation Pass 4
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated Tables and Code
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Header of Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Sort Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Header of tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Header
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Tables
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Metadata User Strings
ilwrite: TIME      0.016 (total)        0.000 (delta) - Write Blob Stream
ilwrite: TIME      0.016 (total)        0.000 (delta) - Fixup Metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Generated IL and metadata
ilwrite: TIME      0.016 (total)        0.000 (delta) - Layout image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Writing Image
ilwrite: TIME      0.016 (total)        0.000 (delta) - Finalize PDB
ilwrite: TIME      0.016 (total)        0.000 (delta) - Signing Image
TIME:  0.3 Delta:  0.0 Mem: 154 G0:   0 G1:  0 G2:  0 [Write .NET Binary]

with much less time spent jitting:
image

So this tells me that we might still want to prioritize supporting tailcall helpers in R2R, if anything also for start-up time 'parity' on 32-bit runtimes that do not support fast tailcalls. I don't expect that we can do as good as with tailcalls turned off, but probably we can get close.

@dsyme
Copy link
Contributor

dsyme commented Nov 12, 2021

Great investigation! As an aside, we can split those very large methods apart - we should actually do that anyway. Only a couple of the code paths will be being taken, and it will help simplify the code anyway

@EgorBo
Copy link
Member

EgorBo commented Nov 12, 2021

there are few functions that need tailcall helpers the ones that do need them will often take long to JIT

I wonder what is the most popular reason to reject fast tailcalls and emit helpers instead and how many of those tailcall helpers are applied to patterns like this:

tail. call <some return type> MyCall
pop
ret

because it can be more or less easily enabled in JIT to be handled as "fast tc".

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 12, 2021

I wonder what is the most popular reason to reject fast tailcalls and emit helpers instead and how many of those tailcall helpers are applied to patterns like this:

I don't think we do any tailcalls in the example you highlight, it should throw InvalidProgramException on .NET core, so I don't think fsc is producing that IL pattern at all.
The reasons for fast tailcalls not working out during the composite R2R compilation for win-x64 is "Not enough incoming arg space" (550 occurences) and "Callee has a byref parameter" (366 occurences) (attached Excel sheet below).

tailcalls.xlsx

@EgorBo
Copy link
Member

EgorBo commented Nov 12, 2021

it should throw InvalidProgramException on .NET core

ah, oops, totally forgot about that 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Impact-High (Internal MS Team use only) Describes an issue with extreme impact on existing code. Theme-Performance
Projects
None yet
Development

No branches or pull requests

7 participants