-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: very simple cloning heuristic #108771
JIT: very simple cloning heuristic #108771
Conversation
Avoid cloning large loops. We compute loop size by counting tree nodes of all statements of all blocks in the loop. If this is over a threshold, we inhibit cloning. Threshold value was chosen based on distribution of unrestricted cloned loop sizes in the benchmark run_pgo collection.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib Distribution of unrestricted loop clone sizes (from x64 benchmarks.run_pgo, units are tree node counts). There aren't many large loops and cloning them quite likely has a poor size/tp/perf tradeoff. So let's try not cloning them and see how it goes. I played around with more "sophisticated" heuristics (some vestiges still there in this PR) where I also tried to compute the size reduction in the cloned loop and the performance impact using block weights. But without access to My proposal is to merge this (or something like it) and adjust as needed after the fact based on perf lab and other data. |
@BruceForstall PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable. Have you tried running the benchi microbenchmarks, e.g., lloops?
No diffs in any benchi/benchf(w/pgo) ... there are some in Bytemark's |
Fixes from review Co-authored-by: Bruce Forstall <[email protected]>
Ah, right, ludcmp is the case I was thinking about. |
Some huge diffs. Especially in libraries_tests. And in ludcmp, as expected. Surprisingly, some size regressions? Maybe cloning was enabling some size-improving optimizations (and getting rid of the cloning overhead)? |
Avoid cloning large loops.
We compute loop size by counting tree nodes of all statements of all blocks in the loop. If this is over a threshold, we inhibit cloning.
Threshold value was chosen based on distribution of unrestricted cloned loop sizes in the benchmark run_pgo collection.