Regressions from 3-opt #109613

performanceautofiler · 2024-11-07T07:58:35Z

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_BitArray

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
BitArrayAnd - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	30.10 ns	34.89 ns	1.16	0.13	False
BitArrayOr - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	29.06 ns	34.64 ns	1.19	0.14	False
BitArrayXor - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	28.98 ns	34.14 ns	1.18	0.12	False
BitArrayNot - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	22.98 ns	27.61 ns	1.20	0.20	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_BitArray*'

System.Collections.Tests.Perf_BitArray.BitArrayAnd(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayOr(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayXor(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Span.IndexerBench

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
CoveredIndex2 - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	1.21 μs	1.38 μs	1.14	0.00	False
CoveredIndex3 - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	1.72 μs	2.06 μs	1.20	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.IndexerBench*'

Span.IndexerBench.CoveredIndex2(length: 1024)

ETL Files

Histogram

JIT Disasms

Span.IndexerBench.CoveredIndex3(length: 1024)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
IndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	539.40 ns	623.39 ns	1.16	0.01	False
LastIndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	540.64 ns	624.02 ns	1.15	0.01	False
LastIndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	539.27 ns	623.78 ns	1.16	0.01	False
IndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	768.06 ns	850.33 ns	1.11	0.01	False
IndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	539.11 ns	624.33 ns	1.16	0.01	False
IndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	539.27 ns	624.84 ns	1.16	0.01	False
LastIndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	798.90 ns	885.54 ns	1.11	0.03	False
IndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	767.89 ns	852.57 ns	1.11	0.01	False
LastIndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	539.20 ns	624.38 ns	1.16	0.01	False
LastIndexOf_Word_NotFound - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	799.43 ns	878.06 ns	1.10	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in PerfLabTests.LowLevelPerf

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
ForeachOverList100Elements - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	8.45 ms	10.10 ms	1.19	0.01	False
InterfaceInterfaceMethodLongHierarchy - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	303.39 μs	334.28 μs	1.10	0.05	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.LowLevelPerf*'

PerfLabTests.LowLevelPerf.ForeachOverList100Elements

ETL Files

Histogram

JIT Disasms

PerfLabTests.LowLevelPerf.InterfaceInterfaceMethodLongHierarchy

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
FrozenSet - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	261.70 ns	349.31 ns	1.33	0.01	False
List - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	434.29 ns	519.44 ns	1.20	0.01	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;Int32&gt;*'

System.Collections.IterateForEach<Int32>.FrozenSet(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.IterateForEach<Int32>.List(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
IndexOfAnyFourValues - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	863.82 ns	1.12 μs	1.29	0.01	False
IndexOfAnyFiveValues - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	1.04 μs	1.29 μs	1.25	0.01	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'

System.Memory.Span<Int32>.IndexOfAnyFourValues(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Memory.Span<Int32>.IndexOfAnyFiveValues(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Char_IsLower - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	25.08 ns	32.27 ns	1.29	0.03	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'

System.Tests.Perf_Char.Char_IsLower(input: "Good afternoon, Constable!")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Struct.SpanWrapper

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
WrapperSum - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	6.79 μs	10.05 μs	1.48	0.01	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.SpanWrapper*'

Struct.SpanWrapper.WrapperSum

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Enumerate - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	104.38 ns	119.96 ns	1.15	0.01	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_PriorityQueue&lt;Int32, Int32&gt;*'

System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 100)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
BubbleSortSpan - Duration of single invocation 📝 - Benchmark Source ADX - Test Multi Config Graph	218.42 μs	242.60 μs	1.11	0.00	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'

Span.Sorting.BubbleSortSpan(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

AndyAyersMS · 2024-11-07T15:46:13Z

#103450

AndyAyersMS · 2024-11-07T15:48:33Z

@amanasifkhalid FYI

Improvements:

[Perf] Linux/arm64: 17 Improvements on 11/4/2024 8:24:32 PM perf-autofiling-issues#44492
[Perf] Windows/x64: 61 Improvements on 11/4/2024 6:18:38 PM perf-autofiling-issues#44725
[Perf] Linux/x64: 69 Improvements on 11/4/2024 6:18:38 PM perf-autofiling-issues#44684
[Perf] Windows/x64: 27 Improvements on 11/4/2024 6:18:38 PM perf-autofiling-issues#44688
[Perf] Linux/x64: 15 Improvements on 11/4/2024 6:18:38 PM perf-autofiling-issues#44719
[Perf] Windows/x64: 5 Improvements on 11/5/2024 12:19:32 AM perf-autofiling-issues#44726
[Perf] Windows/x64: 2 Improvements on 11/5/2024 2:05:14 AM perf-autofiling-issues#44690
[Perf] Linux/x64: 1 Improvement on 11/4/2024 4:04:38 PM perf-autofiling-issues#44718
[Perf] Windows/x64: 1 Improvement on 11/4/2024 6:49:20 AM perf-autofiling-issues#44723

Regressions:

amanasifkhalid · 2024-11-07T21:08:17Z

I took a look at a few of the regressions, and many of them seem to stem from mis-rotated loops. Because the cost model currently doesn't differentiate between conditional and unconditional jumps, 3-opt tends to make naive decisions about moving loop exits. For example, from Struct.SpanWrapper.WrapperSum:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB06 -> BB02 (current partition score = 87.394958, new partition score = 167.067227)
Creating fallthrough for BB04 -> BB06 (current partition score = 87.394958, new partition score = 168.067227)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

If we can tweak the cost model such that it decides creating fallthrough for BB06 -> BB02 is unprofitable, then 3-opt will instead create fallthrough for BB06 -> BB07, thus creating the ideal loop exit shape. As a consequence, we will push BB05 further out-of-line; in order to consider moving BB05 back into the loop body, we'd probably have to model forward vs backward jumps in the cost model to make such a move profitable.

PerfLabTests.LowLevelPerf.ForEachOverList100Elements has a similar shape:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB09 -> BB03 (current partition score = 6962.966716, new partition score = 11469.746967)
Creating fallthrough for BB07 -> BB09 (current partition score = 0.000000, new partition score = 6795.899489)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I suspect making the move BB09 -> BB03 unprofitable with some constant for conditional jumps would fix this.

Since 3-opt currently optimizes for maximal layout score (only because it's cheaper to sum the weights of edges that now fall through, rather than sum the weights of edges that now don't fall through), I suspect we want to begin by penalizing scores for conditional jumps by some multiplier k, where 0 < k < 1. @AndyAyersMS do you have a recommended starting point for k, or is this a matter of trial and error? I suppose if we want to try modeling something as granular as described in Young et. al.'s Near-optimal Intraprocedural Branch Alignment, we're better off refactoring 3-opt to minimize cost instead of maximizing score.

AndyAyersMS · 2024-11-07T22:13:36Z

penalizing scores for conditional jumps by some multiplier k

I would think the value of k would be dependent on the likelihood of branching; something like k = 1 - (likelihood of branching). But this isn't quite right because a highly predictable branch should be somewhat cheaper than a less predictable branch (and we can use likelihoods close to 1 as indicators of predictability).

But I agree it is confusing to think in benefit terms, as I really think of this as a cost minimization problem....

LoopedBard3 · 2024-11-12T17:21:19Z

Github missed linking the original PR: #103450

performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Nov 7, 2024

performanceautofiler bot mentioned this issue Nov 7, 2024

## [SENTINEL] Autofile run complete at 11/7/2024 8:04:30 AM. 10 issues filed. dotnet/perf-autofiling-issues#44493

Closed

AndyAyersMS transferred this issue from dotnet/perf-autofiling-issues Nov 7, 2024

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 7, 2024

AndyAyersMS changed the title ~~[Perf] Linux/arm64: 26 Regressions on 11/4/2024 8:24:32 PM~~ Regressions from 3-opt Nov 7, 2024

AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Nov 7, 2024

amanasifkhalid self-assigned this Nov 7, 2024

vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 10, 2024

LoopedBard3 mentioned this issue Nov 12, 2024

JIT: Add 3-opt implementation for improving upon RPO-based layout #103450

Merged

JulieLeeMSFT added this to the 10.0.0 milestone Nov 21, 2024

Regressions from 3-opt #109613

Regressions from 3-opt #109613

Comments

performanceautofiler bot commented Nov 7, 2024

Run Information

Regressions in System.Collections.Tests.Perf_BitArray

Repro

System.Collections.Tests.Perf_BitArray.BitArrayAnd(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayOr(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayXor(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Run Information

Regressions in Span.IndexerBench

Repro

Span.IndexerBench.CoveredIndex2(length: 1024)

ETL Files

Histogram

JIT Disasms

Span.IndexerBench.CoveredIndex3(length: 1024)

ETL Files

Histogram

JIT Disasms

Docs

Run Information

Regressions in System.Globalization.Tests.StringSearch

Repro

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, False))

ETL Files

Histogram

JIT Disasms

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False))

ETL Files

Histogram

JIT Disasms

Docs