Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DenseLayout trial to SabreLayout #12453

Merged
merged 1 commit into from
May 22, 2024
Merged

Conversation

mtreinish
Copy link
Member

Summary

Building on the work done in #10829, #10721, and #12104 this commit adds
a new trial to all runs of SabreLayout that runs the dense layout
pass. In general the sabre layout algorithm starts from a random layout
and then runs a routing algorithm to permute that layout virtually where
swaps would be inserted to select a layout that would result in fewer
swaps. As was discussed in #10721 and #10829 that random starting point
is often not ideal especially for larger targets where the distance
between qubits can be quite far. Especially when the circuit qubit count
is low relative to the target's qubit count this can result it poor
layouts as the distance between the qubits is too large. In qiskit we
have an existing pass, DenseLayout, which tries to find the most
densely connected n qubit subgraph of a connectivity graph. This
algorithm necessarily will select a starting layout where the qubits are
near each other and for those large backends where the random starting
layout doesn't work well this can improve the output quality.

As the overhead of DenseLayout is quite low and the core algorithm is
written in rust already this commit adds a default trial that uses
DenseLayout as a starting point on top of the random trials (and any
input starting points). For example if the user specifies to run
SabreLayout with 20 layout trials this will run 20 random trials and
one trial with DenseLayout as the starting point. This is all done
directly in the sabre layout rust code for efficiency. The other
difference between the standalone DenseLayout pass is that in the
standalone pass a sparse matrix is built and a reverse Cuthill-McKee
permutation is run on the densest subgraph qubits to pick the final
layout. This permutation is skipped because in the case of Sabre's
use of dense layout we're relying on the sabre algorithm to perform
the permutation.

Details and comments

Depends on: #12104

This PR is based on top of #12104 and will be manually rebased after that merges. You can look at: 06e208f for the details of just this PR.

@mtreinish mtreinish added on hold Can not fix yet performance Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler labels May 22, 2024
@mtreinish mtreinish added this to the 1.2.0 milestone May 22, 2024
@mtreinish mtreinish requested a review from a team as a code owner May 22, 2024 13:30
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

@mtreinish
Copy link
Member Author

mtreinish commented May 22, 2024

I ran some quick asv benchmarks with this:

Benchmarks that have improved:

| Change   |   Before [8e3218bc] <dense-layout-rust-interface~1^2> |   After [2bee114a] <dense-sabre-ftw> |   Ratio | Benchmark (Parameter)                                       |
|----------|-------------------------------------------------------|--------------------------------------|---------|-------------------------------------------------------------|
| -        |                                                  2582 |                                 1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')  |
| -        |                                                  2582 |                                 1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')  |
| -        |                                                  2582 |                                 1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr') |

Benchmarks that have stayed the same:

| Change   | Before [8e3218bc] <dense-layout-rust-interface~1^2>   | After [2bee114a] <dense-sabre-ftw>   |   Ratio | Benchmark (Parameter)                                                                                  |
|----------|-------------------------------------------------------|--------------------------------------|---------|--------------------------------------------------------------------------------------------------------|
|          | 82.9±0.6ms                                            | 86.1±1ms                             |    1.04 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                               |
|          | 83.1±0.3ms                                            | 86.0±0.9ms                           |    1.03 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                               |
|          | 1.94±0.01s                                            | 1.98±0.02s                           |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                   |
|          | 71.3±0.3ms                                            | 72.6±1ms                             |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                          |
|          | 160±0.8ms                                             | 163±1ms                              |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)        |
|          | 174±2ms                                               | 176±0.2ms                            |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)        |
|          | 80.9±0.4ms                                            | 81.5±0.6ms                           |    1.01 | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                |
|          | 290±0.7ms                                             | 292±1ms                              |    1.01 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                               |
|          | 3.31±0s                                               | 3.34±0.01s                           |    1.01 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                   |
|          | 2.77±0.05s                                            | 2.79±0.02s                           |    1.01 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                  |
|          | 936±6ms                                               | 937±3ms                              |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                   |
|          | 1.71±0.01s                                            | 1.72±0.01s                           |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                   |
|          | 122±0.5ms                                             | 122±1ms                              |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                          |
|          | 157±1ms                                               | 157±2ms                              |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                          |
|          | 79.9±0.1ms                                            | 79.9±0.2ms                           |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)        |
|          | 2565                                                  | 2565                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)            |
|          | 1391                                                  | 1391                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)            |
|          | 1258                                                  | 1258                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)            |
|          | 1314                                                  | 1314                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)            |
|          | 2705                                                  | 2705                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)                   |
|          | 2005                                                  | 2005                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)                   |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)                   |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)                   |
|          | 2705                                                  | 2705                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0) |
|          | 2005                                                  | 2005                                 |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1) |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2) |
|          | 7                                                     | 7                                    |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3) |
|          | 323                                                   | 323                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)                        |
|          | 304                                                   | 304                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                        |
|          | 307                                                   | 307                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)                        |
|          | 268                                                   | 268                                  |    1    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                        |
|          | 9.38±0.1ms                                            | 9.41±0.02ms                          |    1    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                       |
|          | 33.3±0.2ms                                            | 33.2±0.2ms                           |    1    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                           |
|          | 1.24±0.01s                                            | 1.25±0.01s                           |    1    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                   |
|          | 25.7±0.2s                                             | 25.6±0.1s                            |    1    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                    |
|          | 1.43±0.01s                                            | 1.43±0.01s                           |    1    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                      |
|          | 2.17±0.01s                                            | 2.17±0.02s                           |    1    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                      |
|          | 1483                                                  | 1483                                 |    1    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                            |
|          | 1488                                                  | 1488                                 |    1    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                            |
|          | 1488                                                  | 1488                                 |    1    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                           |
|          | 435                                                   | 435                                  |    1    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                               |
|          | 435                                                   | 435                                  |    1    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                               |
|          | 435                                                   | 435                                  |    1    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                              |
|          | 106±0.9ms                                             | 105±1ms                              |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                |
|          | 163±1ms                                               | 161±0.4ms                            |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                          |
|          | 317±3ms                                               | 314±1ms                              |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                               |
|          | 9.49±0.09ms                                           | 9.38±0.09ms                          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                        |
|          | 9.47±0.2ms                                            | 9.41±0.03ms                          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                        |
|          | 104±1ms                                               | 102±0.5ms                            |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                         |
|          | 104±0.4ms                                             | 103±2ms                              |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                        |
|          | 33.5±0.3ms                                            | 33.3±0.2ms                           |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                           |
|          | 26.1±0s                                               | 25.8±0s                              |    0.99 | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                   |
|          | 1.99±0s                                               | 1.98±0.01s                           |    0.99 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                     |
|          | 360±3ms                                               | 355±2ms                              |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                   |
|          | 171±3ms                                               | 168±3ms                              |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)        |
|          | 104±0.9ms                                             | 102±0.3ms                            |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                         |
|          | 33.6±0.2ms                                            | 33.0±0.3ms                           |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                          |
|          | 23.0±0.08s                                            | 22.5±0.02s                           |    0.98 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                    |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

In this case it shows that at various scale the runtime overhead is neglible but in some cases the quality of output is greatly improved. The extra trial only takes effect in cases where it results in a better outcome.

@mtreinish mtreinish added the Rust This PR or issue is related to Rust code in the repository label May 22, 2024
@coveralls
Copy link

coveralls commented May 22, 2024

Pull Request Test Coverage Report for Build 9196491844

Details

  • 26 of 28 (92.86%) changed or added relevant lines in 1 file are covered.
  • 21 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.02%) to 89.596%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/sabre/layout.rs 26 28 92.86%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 4 92.11%
crates/accelerate/src/sabre/layout.rs 17 90.21%
Totals Coverage Status
Change from base Build 9195596815: -0.02%
Covered Lines: 62334
Relevant Lines: 69572

💛 - Coveralls

Copy link
Contributor

@henryzou50 henryzou50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's great to see the positive impact of integrating the DenseLayout trial into SabreLayout. While we didn't see improvements in the other benchmarks for depth, the significant reduction in depth for QFT benchmarks is particularly impressive and promising. The negligible runtime overhead further reflects the efficiency of this approach. I'm excited to see how we can build on this improvement and explore other methods to further refine our layout selection strategy.

Building on the work done in Qiskit#10829, Qiskit#10721, and Qiskit#12104 this commit adds
a new trial to all runs of `SabreLayout` that runs the dense layout
pass. In general the sabre layout algorithm starts from a random layout
and then runs a routing algorithm to permute that layout virtually where
swaps would be inserted to select a layout that would result in fewer
swaps. As was discussed in Qiskit#10721 and Qiskit#10829 that random starting point
is often not ideal especially for larger targets where the distance
between qubits can be quite far. Especially when the circuit qubit count
is low relative to the target's qubit count this can result it poor
layouts as the distance between the qubits is too large. In qiskit we
have an existing pass, `DenseLayout`, which tries to find the most
densely connected n qubit subgraph of a connectivity graph. This
algorithm necessarily will select a starting layout where the qubits are
near each other and for those large backends where the random starting
layout doesn't work well this can improve the output quality.

As the overhead of `DenseLayout` is quite low and the core algorithm is
written in rust already this commit adds a default trial that uses
DenseLayout as a starting point on top of the random trials (and any
input starting points). For example if the user specifies to run
SabreLayout with 20 layout trials this will run 20 random trials and
one trial with `DenseLayout` as the starting point. This is all done
directly in the sabre layout rust code for efficiency. The other
difference between the standalone `DenseLayout` pass is that in the
standalone pass a sparse matrix is built and a reverse Cuthill-McKee
permutation is run on the densest subgraph qubits to pick the final
layout. This permutation is skipped because in the case of Sabre's
use of dense layout we're relying on the sabre algorithm to perform
the permutation.

Depends on: Qiskit#12104
@henryzou50 henryzou50 enabled auto-merge May 22, 2024 18:25
@henryzou50 henryzou50 added this pull request to the merge queue May 22, 2024
Merged via the queue into Qiskit:main with commit 531f91c May 22, 2024
15 checks passed
@mtreinish mtreinish deleted the dense-sabre-ftw branch May 24, 2024 14:02
ElePT pushed a commit to ElePT/qiskit that referenced this pull request May 31, 2024
Building on the work done in Qiskit#10829, Qiskit#10721, and Qiskit#12104 this commit adds
a new trial to all runs of `SabreLayout` that runs the dense layout
pass. In general the sabre layout algorithm starts from a random layout
and then runs a routing algorithm to permute that layout virtually where
swaps would be inserted to select a layout that would result in fewer
swaps. As was discussed in Qiskit#10721 and Qiskit#10829 that random starting point
is often not ideal especially for larger targets where the distance
between qubits can be quite far. Especially when the circuit qubit count
is low relative to the target's qubit count this can result it poor
layouts as the distance between the qubits is too large. In qiskit we
have an existing pass, `DenseLayout`, which tries to find the most
densely connected n qubit subgraph of a connectivity graph. This
algorithm necessarily will select a starting layout where the qubits are
near each other and for those large backends where the random starting
layout doesn't work well this can improve the output quality.

As the overhead of `DenseLayout` is quite low and the core algorithm is
written in rust already this commit adds a default trial that uses
DenseLayout as a starting point on top of the random trials (and any
input starting points). For example if the user specifies to run
SabreLayout with 20 layout trials this will run 20 random trials and
one trial with `DenseLayout` as the starting point. This is all done
directly in the sabre layout rust code for efficiency. The other
difference between the standalone `DenseLayout` pass is that in the
standalone pass a sparse matrix is built and a reverse Cuthill-McKee
permutation is run on the densest subgraph qubits to pick the final
layout. This permutation is skipped because in the case of Sabre's
use of dense layout we're relying on the sabre algorithm to perform
the permutation.

Depends on: Qiskit#12104
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 23, 2024
Right now sabre layout uses n random trials (as specified by the user or
defaulting to num_cpus) and since Qiskit#12453 one additional trial taking the
qubits of the densest subgraph as a starting point. There are also a
couple of other trivial examples to try which may or may not produce
better results depending on the circuit, a trivial layout and a reverse
trivial layout. In the case of a hardware efficient circuit the trivial
layout will map exactly and would always be picked. When running in a
preset pass manager sabre should never be called in this scenario
because VF2Layout will find the mapping, but the incremental cost of
adding the trial is minimal. Similarly the cost of a reversed trivial
layout (where virtual qubit 0 -> n, 1 -> n - 1, etc.) is trivial and
may in some scenarios produce a better results.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 23, 2024
Right now sabre layout uses n random trials (as specified by the user or
defaulting to num_cpus) and since Qiskit#12453 one additional trial taking the
qubits of the densest subgraph as a starting point. There are also a
couple of other trivial examples to try which may or may not produce
better results depending on the circuit, a trivial layout and a reverse
trivial layout. In the case of a hardware efficient circuit the trivial
layout will map exactly and would always be picked. When running in a
preset pass manager sabre should never be called in this scenario
because VF2Layout will find the mapping, but the incremental cost of
adding the trial is minimal. Similarly the cost of a reversed trivial
layout (where virtual qubit 0 -> n, 1 -> n - 1, etc.) is trivial and
may in some scenarios produce a better results.
github-merge-queue bot pushed a commit that referenced this pull request Nov 6, 2024
* Add more default trials to sabre layout

Right now sabre layout uses n random trials (as specified by the user or
defaulting to num_cpus) and since #12453 one additional trial taking the
qubits of the densest subgraph as a starting point. There are also a
couple of other trivial examples to try which may or may not produce
better results depending on the circuit, a trivial layout and a reverse
trivial layout. In the case of a hardware efficient circuit the trivial
layout will map exactly and would always be picked. When running in a
preset pass manager sabre should never be called in this scenario
because VF2Layout will find the mapping, but the incremental cost of
adding the trial is minimal. Similarly the cost of a reversed trivial
layout (where virtual qubit 0 -> n, 1 -> n - 1, etc.) is trivial and
may in some scenarios produce a better results.

* Update layouts for failing tests

* Add ring layouts for common connectivity graphs

* Add release notes and docs

* Apply suggestions from code review

Co-authored-by: Elena Peña Tapia <[email protected]>

* Fix lint

---------

Co-authored-by: Elena Peña Tapia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler on hold Can not fix yet performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants