-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate .tailcall improvements, identify further optimisations #12196
Comments
Some tests and comparisons of desktop compiler vs the one shipped in 6.0.100 and builds from main (.NET 7):
VS2022 - FSC.exe
VS2022 - fscAnyCpu.exe
.NET 6.0.100
.NET 7.0.100-alpha.1.21558.2
|
So faster but not as fast. Could you compare with |
Also how many times did you repeat? Thanks |
I've been running each 100 times. |
Added fscAnyCpu too |
That should do it 😂
OK thanks I'm not sure it's easy to get, but I'd be curious of startup time comparison for some large C# projects that are equally comparable. .NET Framework has a long tradition of really focusing on optimized startup with optimized binary layout for native mscorlib etc. It's really sophisticated. I'm not entirely sure if all these are present in .NET Core SDK distributions though they might be and more. It could be that we realistically expect some slightly slower startup regardless of tailcalls. @jkotas might have more insight. Anyway I think the 0.7 --> 1.0 is looking fairly acceptable. Larger compilations should also see only a ~0.3sec slowdown hopefully, so the costs are only paid once. |
What is under test here? IIUC, a simple small F# program being compiled with a .NET Framework version of the F# compiler, and .NET 6/7 versions of the (same source code?) F# compiler? If you give me a short guide on how to measure these things I can try to check how much time we are spending on JIT and if there are other optimizations we can do to help. |
These optimizations are not present in .NET Core. These optimizations in .NET Framework make the AOT compiled fragile: The AOT binary can be used only if all its dependencies match exactly between the AOT time and R2R time. There is global NGen services that regenerates all binaries as needed (e.g. after servicing). This model was unworkable for .NET Core for multiple reasons. .NET Core uses version-resilient AOT compiler code. You can read more about it in https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/readytorun-overview.md. The version-resilient AOT compiled is less efficient, but we compensate for it in other ways (e.g. using tiered compilation). |
Yep, that's pretty much what we try to test here. We compile a simple F# "Hello World" program and measure compiler start time.
Let me try to come up some simple instructions how to test it. |
@vzarytovskii Is this on Win only? Any info about Linux/MacOS?. As far as I understand it should help Unix family unconditionally or am I mistaken? |
This is Windows. |
To just run the compiler locally, you can invoke the dll directly, e.g. If you want to test on local build, just clone Let me try to come up with something easier to run. |
Is that cross-gen'd? |
Ah, yeah, it's not, forgot to mention. This will just build the compiler. I guess you can just add needed options to |
This is what I get if I rebuild compiler with collected PGO data:
|
Here are some experimental results on my machine. All the times are "best of 5". VS2022 fscAnyCpu.exeTIME: 0.2 Delta: 0.2 Mem: 121 G0: 4 G1: 4 G2: 4 [Import mscorlib and FSharp.Core.dll]
TIME: 0.2 Delta: 0.0 Mem: 126 G0: 0 G1: 0 G2: 0 [Parse inputs]
TIME: 0.2 Delta: 0.0 Mem: 126 G0: 0 G1: 0 G2: 0 [Import non-system references]
TIME: 0.3 Delta: 0.0 Mem: 138 G0: 0 G1: 0 G2: 0 [Typecheck]
TIME: 0.3 Delta: 0.0 Mem: 138 G0: 0 G1: 0 G2: 0 [Typechecked]
TIME: 0.3 Delta: 0.0 Mem: 138 G0: 0 G1: 0 G2: 0 [Write Interface File]
TIME: 0.3 Delta: 0.0 Mem: 138 G0: 0 G1: 0 G2: 0 [Write XML document signatures]
TIME: 0.3 Delta: 0.0 Mem: 138 G0: 0 G1: 0 G2: 0 [Write XML docs]
TIME: 0.3 Delta: 0.0 Mem: 155 G0: 0 G1: 0 G2: 0 [Encode Interface Data]
TIME: 0.3 Delta: 0.0 Mem: 157 G0: 0 G1: 0 G2: 0 [Optimizations]
TIME: 0.3 Delta: 0.0 Mem: 157 G0: 0 G1: 0 G2: 0 [Ending Optimizations]
TIME: 0.3 Delta: 0.0 Mem: 158 G0: 0 G1: 0 G2: 0 [Encoding OptData]
TIME: 0.3 Delta: 0.0 Mem: 162 G0: 0 G1: 0 G2: 0 [TAST -> IL]
ilwrite: TIME 0.000 (total) 0.281 (delta) - Write Started
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Preparation
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 1
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 2
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 3
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 4
ilwrite: TIME 0.000 (total) 0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME 0.000 (total) 0.000 (delta) - Generated Tables and Code
ilwrite: TIME 0.000 (total) 0.000 (delta) - Layout Header of Tables
ilwrite: TIME 0.000 (total) 0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME 0.000 (total) 0.000 (delta) - Sort Tables
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Header of tablebuf
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME 0.000 (total) 0.000 (delta) - Layout Metadata
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Metadata Header
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Metadata Tables
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Metadata Strings
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Metadata User Strings
ilwrite: TIME 0.000 (total) 0.000 (delta) - Write Blob Stream
ilwrite: TIME 0.000 (total) 0.000 (delta) - Fixup Metadata
ilwrite: TIME 0.000 (total) 0.000 (delta) - Generated IL and metadata
ilwrite: TIME 0.000 (total) 0.000 (delta) - Layout image
ilwrite: TIME 0.000 (total) 0.000 (delta) - Writing Image
ilwrite: TIME 0.000 (total) 0.000 (delta) - Finalize PDB
ilwrite: TIME 0.000 (total) 0.000 (delta) - Signing Image
TIME: 0.3 Delta: 0.0 Mem: 165 G0: 0 G1: 0 G2: 0 [Write .NET Binary] .NET 6.0.100TIME: 0.3 Delta: 0.2 Mem: 126 G0: 4 G1: 3 G2: 2 [Import mscorlib and FSharp.Core.dll]
TIME: 0.3 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Parse inputs]
TIME: 0.3 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Import non-system references]
TIME: 0.6 Delta: 0.2 Mem: 149 G0: 0 G1: 0 G2: 0 [Typecheck]
TIME: 0.6 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Typechecked]
TIME: 0.6 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Write Interface File]
TIME: 0.6 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Write XML document signatures]
TIME: 0.6 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Write XML docs]
TIME: 0.6 Delta: 0.0 Mem: 156 G0: 1 G1: 0 G2: 0 [Encode Interface Data]
TIME: 0.7 Delta: 0.1 Mem: 159 G0: 0 G1: 0 G2: 0 [Optimizations]
TIME: 0.7 Delta: 0.0 Mem: 159 G0: 0 G1: 0 G2: 0 [Ending Optimizations]
TIME: 0.7 Delta: 0.0 Mem: 160 G0: 0 G1: 0 G2: 0 [Encoding OptData]
TIME: 0.8 Delta: 0.1 Mem: 162 G0: 1 G1: 1 G2: 1 [TAST -> IL]
ilwrite: TIME 0.000 (total) 0.797 (delta) - Write Started
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Preparation
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 1
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 2
ilwrite: TIME 0.031 (total) 0.031 (delta) - Module Generation Pass 3
ilwrite: TIME 0.031 (total) 0.000 (delta) - Module Generation Pass 4
ilwrite: TIME 0.031 (total) 0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME 0.031 (total) 0.000 (delta) - Generated Tables and Code
ilwrite: TIME 0.031 (total) 0.000 (delta) - Layout Header of Tables
ilwrite: TIME 0.031 (total) 0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME 0.031 (total) 0.000 (delta) - Sort Tables
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Header of tablebuf
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME 0.031 (total) 0.000 (delta) - Layout Metadata
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Metadata Header
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Metadata Tables
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Metadata Strings
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Metadata User Strings
ilwrite: TIME 0.031 (total) 0.000 (delta) - Write Blob Stream
ilwrite: TIME 0.031 (total) 0.000 (delta) - Fixup Metadata
ilwrite: TIME 0.031 (total) 0.000 (delta) - Generated IL and metadata
ilwrite: TIME 0.031 (total) 0.000 (delta) - Layout image
ilwrite: TIME 0.031 (total) 0.000 (delta) - Writing Image
ilwrite: TIME 0.031 (total) 0.000 (delta) - Finalize PDB
ilwrite: TIME 0.031 (total) 0.000 (delta) - Signing Image
TIME: 0.8 Delta: 0.0 Mem: 164 G0: 0 G1: 0 G2: 0 [Write .NET Binary] .NET 7.0.100-alpha.1.21558.2TIME: 0.2 Delta: 0.1 Mem: 125 G0: 4 G1: 3 G2: 2 [Import mscorlib and FSharp.Core.dll]
TIME: 0.3 Delta: 0.1 Mem: 137 G0: 0 G1: 0 G2: 0 [Parse inputs]
TIME: 0.3 Delta: 0.0 Mem: 137 G0: 0 G1: 0 G2: 0 [Import non-system references]
TIME: 0.4 Delta: 0.1 Mem: 149 G0: 0 G1: 0 G2: 0 [Typecheck]
TIME: 0.4 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Typechecked]
TIME: 0.4 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Write Interface File]
TIME: 0.4 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Write XML document signatures]
TIME: 0.4 Delta: 0.0 Mem: 149 G0: 0 G1: 0 G2: 0 [Write XML docs]
TIME: 0.5 Delta: 0.0 Mem: 157 G0: 1 G1: 0 G2: 0 [Encode Interface Data]
TIME: 0.5 Delta: 0.0 Mem: 159 G0: 0 G1: 0 G2: 0 [Optimizations]
TIME: 0.5 Delta: 0.0 Mem: 159 G0: 0 G1: 0 G2: 0 [Ending Optimizations]
TIME: 0.5 Delta: 0.0 Mem: 160 G0: 0 G1: 0 G2: 0 [Encoding OptData]
TIME: 0.6 Delta: 0.1 Mem: 162 G0: 1 G1: 1 G2: 1 [TAST -> IL]
ilwrite: TIME 0.000 (total) 0.625 (delta) - Write Started
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Preparation
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 1
ilwrite: TIME 0.016 (total) 0.016 (delta) - Module Generation Pass 2
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 3
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 4
ilwrite: TIME 0.016 (total) 0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME 0.016 (total) 0.000 (delta) - Generated Tables and Code
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout Header of Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Sort Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Header of tablebuf
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout Metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Header
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Strings
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata User Strings
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Blob Stream
ilwrite: TIME 0.016 (total) 0.000 (delta) - Fixup Metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Generated IL and metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout image
ilwrite: TIME 0.016 (total) 0.000 (delta) - Writing Image
ilwrite: TIME 0.016 (total) 0.000 (delta) - Finalize PDB
ilwrite: TIME 0.016 (total) 0.000 (delta) - Signing Image
TIME: 0.6 Delta: 0.0 Mem: 164 G0: 0 G1: 0 G2: 0 [Write .NET Binary] Composite modeComposite mode with compilebubblegenerics is a little better, I updated global.json and nuget.config to use the .NET 7 SDK and published with
and it gives: TIME: 0.2 Delta: 0.1 Mem: 121 G0: 4 G1: 3 G2: 2 [Import mscorlib and FSharp.Core.dll]
TIME: 0.2 Delta: 0.0 Mem: 134 G0: 0 G1: 0 G2: 0 [Parse inputs]
TIME: 0.2 Delta: 0.0 Mem: 134 G0: 0 G1: 0 G2: 0 [Import non-system references]
TIME: 0.3 Delta: 0.1 Mem: 146 G0: 0 G1: 0 G2: 0 [Typecheck]
TIME: 0.3 Delta: 0.0 Mem: 146 G0: 0 G1: 0 G2: 0 [Typechecked]
TIME: 0.3 Delta: 0.0 Mem: 146 G0: 0 G1: 0 G2: 0 [Write Interface File]
TIME: 0.3 Delta: 0.0 Mem: 146 G0: 0 G1: 0 G2: 0 [Write XML document signatures]
TIME: 0.3 Delta: 0.0 Mem: 146 G0: 0 G1: 0 G2: 0 [Write XML docs]
TIME: 0.4 Delta: 0.0 Mem: 154 G0: 1 G1: 0 G2: 0 [Encode Interface Data]
TIME: 0.4 Delta: 0.0 Mem: 156 G0: 0 G1: 0 G2: 0 [Optimizations]
TIME: 0.4 Delta: 0.0 Mem: 156 G0: 0 G1: 0 G2: 0 [Ending Optimizations]
TIME: 0.4 Delta: 0.0 Mem: 157 G0: 0 G1: 0 G2: 0 [Encoding OptData]
TIME: 0.5 Delta: 0.1 Mem: 160 G0: 1 G1: 1 G2: 1 [TAST -> IL]
ilwrite: TIME 0.000 (total) 0.500 (delta) - Write Started
ilwrite: TIME 0.016 (total) 0.016 (delta) - Module Generation Preparation
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 1
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 2
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 3
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 4
ilwrite: TIME 0.016 (total) 0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME 0.016 (total) 0.000 (delta) - Generated Tables and Code
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout Header of Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Sort Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Header of tablebuf
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout Metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Header
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Strings
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata User Strings
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Blob Stream
ilwrite: TIME 0.016 (total) 0.000 (delta) - Fixup Metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Generated IL and metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout image
ilwrite: TIME 0.016 (total) 0.000 (delta) - Writing Image
ilwrite: TIME 0.016 (total) 0.000 (delta) - Finalize PDB
ilwrite: TIME 0.016 (total) 0.000 (delta) - Signing Image
TIME: 0.5 Delta: 0.0 Mem: 162 G0: 0 G1: 0 G2: 0 [Write .NET Binary] But it looks like we are still spending ~27% of the time jitting: Here are some of the ones taking the longest time:
My guess is that there is a correlation between 'complicated big function' and 'requiring tailcall helper', which means that even though there are few functions that need tailcall helpers the ones that do need them will often take long to JIT. To test this I set TIME: 0.2 Delta: 0.1 Mem: 122 G0: 4 G1: 3 G2: 2 [Import mscorlib and FSharp.Core.dll]
TIME: 0.2 Delta: 0.0 Mem: 125 G0: 0 G1: 0 G2: 0 [Parse inputs]
TIME: 0.2 Delta: 0.0 Mem: 125 G0: 0 G1: 0 G2: 0 [Import non-system references]
TIME: 0.2 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Typecheck]
TIME: 0.2 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Typechecked]
TIME: 0.2 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Write Interface File]
TIME: 0.2 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Write XML document signatures]
TIME: 0.2 Delta: 0.0 Mem: 136 G0: 0 G1: 0 G2: 0 [Write XML docs]
TIME: 0.2 Delta: 0.0 Mem: 144 G0: 1 G1: 0 G2: 0 [Encode Interface Data]
TIME: 0.2 Delta: 0.0 Mem: 146 G0: 0 G1: 0 G2: 0 [Optimizations]
TIME: 0.2 Delta: 0.0 Mem: 146 G0: 0 G1: 0 G2: 0 [Ending Optimizations]
TIME: 0.2 Delta: 0.0 Mem: 147 G0: 0 G1: 0 G2: 0 [Encoding OptData]
TIME: 0.3 Delta: 0.1 Mem: 149 G0: 1 G1: 1 G2: 1 [TAST -> IL]
ilwrite: TIME 0.000 (total) 0.312 (delta) - Write Started
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Preparation
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 1
ilwrite: TIME 0.000 (total) 0.000 (delta) - Module Generation Pass 2
ilwrite: TIME 0.016 (total) 0.016 (delta) - Module Generation Pass 3
ilwrite: TIME 0.016 (total) 0.000 (delta) - Module Generation Pass 4
ilwrite: TIME 0.016 (total) 0.000 (delta) - Finalize Module Generation Results
ilwrite: TIME 0.016 (total) 0.000 (delta) - Generated Tables and Code
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout Header of Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Build String/Blob Address Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Sort Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Header of tablebuf
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Tables to tablebuf
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout Metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Header
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Tables
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata Strings
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Metadata User Strings
ilwrite: TIME 0.016 (total) 0.000 (delta) - Write Blob Stream
ilwrite: TIME 0.016 (total) 0.000 (delta) - Fixup Metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Generated IL and metadata
ilwrite: TIME 0.016 (total) 0.000 (delta) - Layout image
ilwrite: TIME 0.016 (total) 0.000 (delta) - Writing Image
ilwrite: TIME 0.016 (total) 0.000 (delta) - Finalize PDB
ilwrite: TIME 0.016 (total) 0.000 (delta) - Signing Image
TIME: 0.3 Delta: 0.0 Mem: 154 G0: 0 G1: 0 G2: 0 [Write .NET Binary] with much less time spent jitting: So this tells me that we might still want to prioritize supporting tailcall helpers in R2R, if anything also for start-up time 'parity' on 32-bit runtimes that do not support fast tailcalls. I don't expect that we can do as good as with tailcalls turned off, but probably we can get close. |
Great investigation! As an aside, we can split those very large methods apart - we should actually do that anyway. Only a couple of the code paths will be being taken, and it will help simplify the code anyway |
I wonder what is the most popular reason to reject fast tailcalls and emit helpers instead and how many of those tailcall helpers are applied to patterns like this:
because it can be more or less easily enabled in JIT to be handled as "fast tc". |
I don't think we do any tailcalls in the example you highlight, it should throw |
ah, oops, totally forgot about that 😄 |
This issue : #11907
Has a lot of great information about the issue resolved by: dotnet/runtime#56669
Validate and test the changes determine whether additional optimisations are necessary.
The text was updated successfully, but these errors were encountered: