Increase heuristic effort for optimization level 2 #12149

mtreinish · 2024-04-05T21:20:38Z

Summary

This commit tweaks the heuristic effort in optimization level 2 to be more of a middle ground between level 1 and 3; with a better balance between output quality and runtime. This places it to be a better default for a pass manager we use if one isn't specified. The tradeoff here is that the vf2layout and vf2postlayout search space is reduced to be the same as level 1. There are diminishing margins of return on the vf2 layout search especially for cases when there are a large number of qubit permutations for the mapping found. Then the number of sabre trials is brought up to the same level as optimization level 3. As this can have a significant impact on output and the extra runtime cost is minimal. The larger change is that the optimization passes from level 3. This ends up mainly being 2q peephole optimization. With the performance improvements from #12010 and #11946 and all the follow-on PRs this is now fast enough to rely on in optimization level 2.

Details and comments

Related to: #7112

This commit tweaks the heuristic effort in optimization level 2 to be more of a middle ground between level 1 and 3; with a better balance between output quality and runtime. This places it to be a better default for a pass manager we use if one isn't specified. The tradeoff here is that the vf2layout and vf2postlayout search space is reduced to be the same as level 1. There are diminishing margins of return on the vf2 layout search especially for cases when there are a large number of qubit permutations for the mapping found. Then the number of sabre trials is brought up to the same level as optimization level 3. As this can have a significant impact on output and the extra runtime cost is minimal. The larger change is that the optimization passes from level 3. This ends up mainly being 2q peephole optimization. With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the follow-on PRs this is now fast enough to rely on in optimization level 2.

qiskit-bot · 2024-04-05T21:20:43Z

One or more of the the following people are requested to review this:

@Qiskit/terra-core

coveralls · 2024-04-05T22:19:19Z

Pull Request Test Coverage Report for Build 8693815185

Details

6 of 6 (100.0%) changed or added relevant lines in 2 files are covered.
7 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.005%) to 89.354%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	94.03%
crates/qasm2/src/lex.rs	6	92.11%

Totals
Change from base Build 8689749566:	-0.005%
Covered Lines:	60163
Relevant Lines:	67331

💛 - Coveralls

For the initial VF2Layout call this commit expands the vf2 call limit back to the previous level instead of reducing it to the same as level 1. The idea behind making this change is that spending up to 10s to find a perfect layout is a worthwhile tradeoff as that will greatly improve the result from execution. But scoring multiple layouts to find the lowest error rate subgraph has a diminishing margin of return in most cases as there typically aren't thousands of unique subgraphs and often when we hit the scoring limit it's just permuting the qubits inside a subgraph which doesn't provide the most value. For VF2PostLayout the lower call limits from level 1 is still used. This is because both the search for isomorphic subgraphs is typically much shorter with the vf2++ node ordering heuristic so we don't need to spend as much time looking for alternative subgraphs.

Due to potential instability in the 2q peephole optimization we run we were using the `MinimumPoint` pass to provide backtracking when we reach a local minimum. However, this pass adds a significant amount of overhead because it deep copies the circuit at every iteration of the optimization loop that improves the output quality. This commit tweaks the O2 pass manager construction to only run 2q peephole once, and then updates the optimization loop to be what the previous O2 optimization loop was.

mtreinish · 2024-04-15T19:17:18Z

I ran the "utility scale" asv benchmarks with this PR and got the following results:

Benchmarks that have improved:

       before           after         ratio
     [e0be97c1]       [53dec8bd]
     <main>       <level-2-v2>
-      1.37±0.03s       1.22±0.01s     0.89  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')
-            1896             1618     0.85  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')
-            1908             1607     0.84  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')
-            1936             1622     0.84  utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')
-             972              444     0.46  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')
-             972              444     0.46  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')
-             972              444     0.46  utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')

Benchmarks that have stayed the same:

       before           after         ratio
     [e0be97c1]       [53dec8bd]
     <main>       <level-2-v2>
       21.1±0.06s       25.3±0.02s    ~1.20  utility_scale.UtilityScaleBenchmarks.time_qft('ecr')
          2.18±0s       2.32±0.01s     1.06  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')
       24.3±0.06s       25.7±0.02s     1.06  utility_scale.UtilityScaleBenchmarks.time_qft('cz')
       29.8±0.2ms       30.7±0.2ms     1.03  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')
       18.6±0.06s       19.1±0.06s     1.03  utility_scale.UtilityScaleBenchmarks.time_qft('cx')
       92.3±0.8ms       94.5±0.4ms     1.02  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')
      29.7±0.08ms       30.2±0.1ms     1.02  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')
         92.9±2ms         94.2±1ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')
       92.7±0.7ms       94.0±0.3ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')
      30.0±0.04ms       30.2±0.4ms     1.01  utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')
      8.56±0.04ms      8.60±0.04ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')
      8.64±0.08ms      8.65±0.02ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')
       8.59±0.1ms      8.60±0.04ms     1.00  utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')
             2598             2582     0.99  utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')
             2598             2582     0.99  utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')
             2598             2496     0.96  utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')
       3.11±0.01s       2.69±0.03s    ~0.87  utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')

Benchmarks that have got worse:

       before           after         ratio
     [e0be97c1]       [53dec8bd]
     <main>       <level-2-v2>
+      2.78±0.04s          5.32±0s     1.92  utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')
+         852±8ms       1.08±0.01s     1.27  utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')
+      5.12±0.01s          6.45±0s     1.26  utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

It's about what I expected. Improvements in quality since we're ramping up heurstic effort in a bunch of places but also at the cost of runtime. The ecr qaoa runtime benchark is a bit more severe than I expected, but the ~25% slower is about what I was expecting.

The only other change I really want to do is building off of #12171 I'd like to change the default sabre heuristic we use in level 2 to lookahead instead of decay. Not to save any runtime, I just expect it to produce better output.

I'm also curious to try dropping the sabre trial counts back down to 10 and see where the tradeoffs are there. I'll rerun the benchmarks and see what they look like with that change. The only place the runtime change can realistically come from is in sabre or by running 2q peephole. It didn't make any difference, if anything it was slower (because of more swaps inserted).

levbishop

I think there'll be more tweaks to O2 as we speed up and streamline other passes, add sabre heuristics, etc, but this is a solid starting point

mtreinish added the Changelog: API Change Include in the "Changed" section of the changelog label Apr 5, 2024

mtreinish added this to the 1.1.0 milestone Apr 5, 2024

mtreinish requested a review from a team as a code owner April 5, 2024 21:20

Add test workaround from level 3 to level 2 too

4007618

mtreinish added 5 commits April 6, 2024 20:09

Merge remote-tracking branch 'origin/main' into level-2-v2

16c6cbc

Merge remote-tracking branch 'origin/main' into level-2-v2

67b2760

Merge branch 'main' into level-2-v2

53dec8b

jakelishman assigned levbishop and ajavadia Apr 18, 2024

levbishop approved these changes Apr 18, 2024

View reviewed changes

levbishop added this pull request to the merge queue Apr 23, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 23, 2024

levbishop added this pull request to the merge queue Apr 23, 2024

Merged via the queue into Qiskit:main with commit 40ac274 Apr 23, 2024
12 checks passed

mtreinish deleted the level-2-v2 branch April 23, 2024 10:00

mtreinish mentioned this pull request Apr 23, 2024

[on hold] Switch convert_2q_block_matrix.rs to use matmul from faer and an explicit version of kron #12193

Draft

This was referenced May 1, 2024

Add star to linear pre-routing pass #11387

Merged

Add ElidePermutations pass to optimization level 3 #12111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase heuristic effort for optimization level 2 #12149

Increase heuristic effort for optimization level 2 #12149

mtreinish commented Apr 5, 2024

qiskit-bot commented Apr 5, 2024

coveralls commented Apr 5, 2024 •

edited

Loading

mtreinish commented Apr 15, 2024 •

edited

Loading

levbishop left a comment

Increase heuristic effort for optimization level 2 #12149

Increase heuristic effort for optimization level 2 #12149

Conversation

mtreinish commented Apr 5, 2024

Summary

Details and comments

qiskit-bot commented Apr 5, 2024

coveralls commented Apr 5, 2024 • edited Loading

Pull Request Test Coverage Report for Build 8693815185

Details

💛 - Coveralls

mtreinish commented Apr 15, 2024 • edited Loading

levbishop left a comment

Choose a reason for hiding this comment

coveralls commented Apr 5, 2024 •

edited

Loading

mtreinish commented Apr 15, 2024 •

edited

Loading