-
-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unoptimized code is used for benchmark #1466
Comments
+1 on this. I'm looking to do similar optimizations and this information would be useful. |
Hi @ebfortin
It does not. To tell the long story short, the warmup or pilot phase should invoke the microbenchmark enough times to make JIT promote it from Tier 0 to Tier 1. This doc explains how BDN works: https://benchmarkdotnet.org/articles/guides/how-it-works.html This discussion is also a great source of knowledge: dotnet/runtime#13069
This is more a JIT question. As far as I remember the methods needs to be invoked more than 30 times (or contain a loop) and there needs to be a time window of 100ms when no new method was JITted. Also, the process can't be 100% busy. Then, the Tiered JIT Thread kicks in, recompiles the method and promotes it from Tier 0 to Tier 1.
Not in BDN itself, but when used BDN with PerfView it's possible:
If given method is not in the table, it means it was not promoted to Tier 1 @ebfortin Could you please share a minimum repro case so I could try to repro it? You must be hitting some kind of edge case |
I believe that I've answered the question. The repro which I've asked for in August was not provided, so I am closing the issue. Please feel free to provide the minimum repro case and reopen the issue. |
Sorry. With COVID and all I completely forgot to answer your request. I zipped a solution for you to test. But here's the result. You can see that unless I disable QuickJIT, the code doesn't seem to get optimized. Also disregard the fact that my intrinsic code is performing less than the naive code. I stopped playing with intrinsics until I get an answer on why it doesn't get used correctly by the runtime. Benchmark SIMD Optimization Test.zip
|
I opened an issue in dotnet/runtime (37216) regarding what I thought at the beginning was a poor register allocation algorithm for hardware intrinsic. It evolved into why BDN do not optimize the code. And that's why I just opened an issue here.
In a nutshell, I have implemented a double double type and I want to optimize the "naive" port from a C library I did with some SIMD instructions. So I compare between the naive code and the SIMD code. Looking at the diassembly I found out the code is not optimized at all. Ever. Except when I force COMPlus_TieredCompilation=0. That is the only moment the code get's optimized. Go so the linked issue in dotnet/runtime for details.
My questions are then:
The text was updated successfully, but these errors were encountered: