JIT: very simple cloning heuristic #108771

AndyAyersMS · 2024-10-11T01:06:04Z

Avoid cloning large loops.

We compute loop size by counting tree nodes of all statements of all blocks in the loop. If this is over a threshold, we inhibit cloning.

Threshold value was chosen based on distribution of unrestricted cloned loop sizes in the benchmark run_pgo collection.

Avoid cloning large loops. We compute loop size by counting tree nodes of all statements of all blocks in the loop. If this is over a threshold, we inhibit cloning. Threshold value was chosen based on distribution of unrestricted cloned loop sizes in the benchmark run_pgo collection.

dotnet-policy-service · 2024-10-11T01:06:34Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

AndyAyersMS · 2024-10-11T01:12:30Z

cc @dotnet/jit-contrib

Distribution of unrestricted loop clone sizes (from x64 benchmarks.run_pgo, units are tree node counts). There aren't many large loops and cloning them quite likely has a poor size/tp/perf tradeoff. So let's try not cloning them and see how it goes.

I played around with more "sophisticated" heuristics (some vestiges still there in this PR) where I also tried to compute the size reduction in the cloned loop and the performance impact using block weights. But without access to gtCostSz and gtCostEx (which themselves are questionable) I didn't have much confidence that such an approach would lead to better decisions.

My proposal is to merge this (or something like it) and adjust as needed after the fact based on perf lab and other data.

AndyAyersMS · 2024-10-11T20:15:38Z

@BruceForstall PTAL
cc @dotnet/jit-contrib

BruceForstall

Looks reasonable. Have you tried running the benchi microbenchmarks, e.g., lloops?

src/coreclr/jit/jitconfigvalues.h

src/coreclr/jit/loopcloning.cpp

AndyAyersMS · 2024-10-14T19:12:14Z

Looks reasonable. Have you tried running the benchi microbenchmarks, e.g., lloops?

No diffs in any benchi/benchf(w/pgo) ... there are some in Bytemark's ludcmp which is one place where cloning was getting carried away (#8558 (comment)).

Fixes from review Co-authored-by: Bruce Forstall <[email protected]>

BruceForstall · 2024-10-14T19:15:31Z

Ah, right, ludcmp is the case I was thinking about.

AndyAyersMS · 2024-10-14T22:23:08Z

Diffs

BruceForstall · 2024-10-14T22:28:56Z

Diffs

Some huge diffs. Especially in libraries_tests. And in ludcmp, as expected. Surprisingly, some size regressions? Maybe cloning was enabling some size-improving optimizations (and getting rid of the cloning overhead)?

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2024

dotnet-policy-service bot assigned AndyAyersMS Oct 11, 2024

build-analysis bot mentioned this pull request Oct 11, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

fix release build

c9f6f69

AndyAyersMS marked this pull request as ready for review October 11, 2024 20:15

AndyAyersMS requested a review from BruceForstall October 11, 2024 20:15

BruceForstall approved these changes Oct 14, 2024

View reviewed changes

src/coreclr/jit/jitconfigvalues.h Outdated Show resolved Hide resolved

src/coreclr/jit/loopcloning.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

6583982

Fixes from review Co-authored-by: Bruce Forstall <[email protected]>

build-analysis bot mentioned this pull request Oct 14, 2024

ReadECDsaPrivateKey_BrainpoolP160r1_Pfx test failure on windows #108815

Closed

AndyAyersMS merged commit 2098981 into dotnet:main Oct 14, 2024
105 of 108 checks passed

AndyAyersMS mentioned this pull request Oct 14, 2024

unblock cloning of loops where the header is a try begin #108604

Merged

LoopedBard3 mentioned this pull request Oct 17, 2024

[Perf] Linux/arm64: SciMark2.kernel Regression on 10/15/2024 8:19:02 AM #108980

Open

DrewScoggins mentioned this pull request Oct 22, 2024

[Perf] Linux/x64: 1 Regression on 10/14/2024 10:37:27 PM #109124

Open

This was referenced Oct 24, 2024

[Perf] Windows/x64: 2 Improvements on 10/14/2024 10:37:27 PM dotnet/perf-autofiling-issues#43417

Closed

[Perf] Windows/x64: 1 Improvement on 10/14/2024 10:37:27 PM dotnet/perf-autofiling-issues#43396

Closed

JIT: De-abstraction in .NET 10 #108913

Open

github-actions bot locked and limited conversation to collaborators Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: very simple cloning heuristic #108771

JIT: very simple cloning heuristic #108771

AndyAyersMS commented Oct 11, 2024

dotnet-policy-service bot commented Oct 11, 2024

AndyAyersMS commented Oct 11, 2024

AndyAyersMS commented Oct 11, 2024

BruceForstall left a comment

AndyAyersMS commented Oct 14, 2024 •

edited

Loading

BruceForstall commented Oct 14, 2024

AndyAyersMS commented Oct 14, 2024

BruceForstall commented Oct 14, 2024

JIT: very simple cloning heuristic #108771

JIT: very simple cloning heuristic #108771

Conversation

AndyAyersMS commented Oct 11, 2024

dotnet-policy-service bot commented Oct 11, 2024

AndyAyersMS commented Oct 11, 2024

AndyAyersMS commented Oct 11, 2024

BruceForstall left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Oct 14, 2024 • edited Loading

BruceForstall commented Oct 14, 2024

AndyAyersMS commented Oct 14, 2024

BruceForstall commented Oct 14, 2024

AndyAyersMS commented Oct 14, 2024 •

edited

Loading