Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressions from 3-opt #109613

Open
performanceautofiler bot opened this issue Nov 7, 2024 · 5 comments
Open

Regressions from 3-opt #109613

performanceautofiler bot opened this issue Nov 7, 2024 · 5 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_BitArray

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
30.10 ns 34.89 ns 1.16 0.13 False
29.06 ns 34.64 ns 1.19 0.14 False
28.98 ns 34.14 ns 1.18 0.12 False
22.98 ns 27.61 ns 1.20 0.20 False

graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_BitArray*'

System.Collections.Tests.Perf_BitArray.BitArrayAnd(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayOr(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayXor(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Span.IndexerBench

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.21 μs 1.38 μs 1.14 0.00 False
1.72 μs 2.06 μs 1.20 0.00 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.IndexerBench*'

Span.IndexerBench.CoveredIndex2(length: 1024)

ETL Files

Histogram

JIT Disasms

Span.IndexerBench.CoveredIndex3(length: 1024)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
539.40 ns 623.39 ns 1.16 0.01 False
540.64 ns 624.02 ns 1.15 0.01 False
539.27 ns 623.78 ns 1.16 0.01 False
768.06 ns 850.33 ns 1.11 0.01 False
539.11 ns 624.33 ns 1.16 0.01 False
539.27 ns 624.84 ns 1.16 0.01 False
798.90 ns 885.54 ns 1.11 0.03 False
767.89 ns 852.57 ns 1.11 0.01 False
539.20 ns 624.38 ns 1.16 0.01 False
799.43 ns 878.06 ns 1.10 0.00 False

graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in PerfLabTests.LowLevelPerf

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
8.45 ms 10.10 ms 1.19 0.01 False
303.39 μs 334.28 μs 1.10 0.05 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.LowLevelPerf*'

PerfLabTests.LowLevelPerf.ForeachOverList100Elements

ETL Files

Histogram

JIT Disasms

PerfLabTests.LowLevelPerf.InterfaceInterfaceMethodLongHierarchy

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
261.70 ns 349.31 ns 1.33 0.01 False
434.29 ns 519.44 ns 1.20 0.01 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;Int32&gt;*'

System.Collections.IterateForEach<Int32>.FrozenSet(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.IterateForEach<Int32>.List(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
863.82 ns 1.12 μs 1.29 0.01 False
1.04 μs 1.29 μs 1.25 0.01 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'

System.Memory.Span<Int32>.IndexOfAnyFourValues(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Memory.Span<Int32>.IndexOfAnyFiveValues(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
25.08 ns 32.27 ns 1.29 0.03 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'

System.Tests.Perf_Char.Char_IsLower(input: "Good afternoon, Constable!")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Struct.SpanWrapper

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
6.79 μs 10.05 μs 1.48 0.01 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.SpanWrapper*'

Struct.SpanWrapper.WrapperSum

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
104.38 ns 119.96 ns 1.15 0.01 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_PriorityQueue&lt;Int32, Int32&gt;*'

System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 100)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
218.42 μs 242.60 μs 1.11 0.00 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'

Span.Sorting.BubbleSortSpan(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Nov 7, 2024
@AndyAyersMS
Copy link
Member

#103450

@AndyAyersMS AndyAyersMS transferred this issue from dotnet/perf-autofiling-issues Nov 7, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 7, 2024
@AndyAyersMS AndyAyersMS changed the title [Perf] Linux/arm64: 26 Regressions on 11/4/2024 8:24:32 PM Regressions from 3-opt Nov 7, 2024
@AndyAyersMS AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Nov 7, 2024
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Nov 7, 2024

@amanasifkhalid FYI

Improvements:

Regressions:

@amanasifkhalid amanasifkhalid self-assigned this Nov 7, 2024
@amanasifkhalid
Copy link
Member

I took a look at a few of the regressions, and many of them seem to stem from mis-rotated loops. Because the cost model currently doesn't differentiate between conditional and unconditional jumps, 3-opt tends to make naive decisions about moving loop exits. For example, from Struct.SpanWrapper.WrapperSum:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB06 -> BB02 (current partition score = 87.394958, new partition score = 167.067227)
Creating fallthrough for BB04 -> BB06 (current partition score = 87.394958, new partition score = 168.067227)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

If we can tweak the cost model such that it decides creating fallthrough for BB06 -> BB02 is unprofitable, then 3-opt will instead create fallthrough for BB06 -> BB07, thus creating the ideal loop exit shape. As a consequence, we will push BB05 further out-of-line; in order to consider moving BB05 back into the loop body, we'd probably have to model forward vs backward jumps in the cost model to make such a move profitable.

PerfLabTests.LowLevelPerf.ForEachOverList100Elements has a similar shape:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB09 -> BB03 (current partition score = 6962.966716, new partition score = 11469.746967)
Creating fallthrough for BB07 -> BB09 (current partition score = 0.000000, new partition score = 6795.899489)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I suspect making the move BB09 -> BB03 unprofitable with some constant for conditional jumps would fix this.

Since 3-opt currently optimizes for maximal layout score (only because it's cheaper to sum the weights of edges that now fall through, rather than sum the weights of edges that now don't fall through), I suspect we want to begin by penalizing scores for conditional jumps by some multiplier k, where 0 < k < 1. @AndyAyersMS do you have a recommended starting point for k, or is this a matter of trial and error? I suppose if we want to try modeling something as granular as described in Young et. al.'s Near-optimal Intraprocedural Branch Alignment, we're better off refactoring 3-opt to minimize cost instead of maximizing score.

@AndyAyersMS
Copy link
Member

penalizing scores for conditional jumps by some multiplier k

I would think the value of k would be dependent on the likelihood of branching; something like k = 1 - (likelihood of branching). But this isn't quite right because a highly predictable branch should be somewhat cheaper than a less predictable branch (and we can use likelihoods close to 1 as indicators of predictability).

But I agree it is confusing to think in benefit terms, as I really think of this as a cost minimization problem....

@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 10, 2024
@LoopedBard3
Copy link
Member

Github missed linking the original PR: #103450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

5 participants