Add benchmarks for Sabre on large QFT and QV circuits #1622

jakelishman · 2022-10-27T12:49:33Z

Summary

Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks.

Details and comments

These could be in either mapping_passes.py or the files I've put them in. It's not super clear where, but since I used the benchmark-internal constructors (to dodge issues with the Terra functions potentially changing in the future), it made some sense to put them in the structure-specific files.

Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks.

mtreinish

The code LGTM, but before I approve I'm going to spin up a run locally, but do you have a rough runtime estimate for these new benchmarks?

jakelishman · 2022-10-27T13:53:35Z

On my machine with main Terra, the largest QFT took 55s and the longest QV took ~1m40s, I think, but I accidentally cleared the database with an overzealous git clean -fdx since I was having trouble running tox in a dirty project directory.

jakelishman · 2022-10-27T13:57:01Z

Ah, I still have the output in the scrollback of my terminal (before I fixed the QFT generation):

jake@ninetales$ asv dev --python 3.10 -b 'LargeQ.*MappingBench.*time'
· Fetching recent changes
· Creating environments
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] · For qiskit-terra commit aab18fcd <main>:
[  0.00%] ·· Benchmarking virtualenv-py3.10
[ 25.00%] ··· Running (qft.LargeQFTMappingBench.time_sabre_swap--).
[ 50.00%] ··· Running (quantum_volume.LargeQuantumVolumeMappingBenchmark.time_sabre_swap--).
[ 75.00%] ··· qft.LargeQFTMappingBench.time_sabre_swap                                                                                                                      2/6 failed
[ 75.00%] ··· ========== ============ ============
              --                 heuristic
              ---------- -------------------------
               n_qubits   lookahead      decay
              ========== ============ ============
                 115       404±40ms     380±9ms
                 409      6.59±0.08s   6.08±0.01s
                 1081       failed       failed
              ========== ============ ============

[100.00%] ··· quantum_volume.LargeQuantumVolumeMappingBenchmark.time_sabre_swap                                                                                                     ok
[100.00%] ··· ========== ================ ============ ================= =============
              --                               depth / heuristic
              ---------- -------------------------------------------------------------
               n_qubits   10 / lookahead   10 / decay   100 / lookahead   100 / decay
              ========== ================ ============ ================= =============
                 115        104±0.9ms      98.9±0.7ms       969±10ms        973±7ms
                 409        21.3±0.4s      7.09±0.03s         n/a             n/a
                 1081       59.1±0.2s       1.53±0m           n/a             n/a
              ========== ================ ============ ================= =============

mtreinish · 2022-10-27T17:29:26Z

I did a run locally, this adds ~30mins to local runtime (31:20.39 for my local asv run call without venv or build time) to the benchmarks for a commit. That's on the long side but considering we just reduced a lot of overhead from the assemble benchmarks and the importance of testing sabre at scale now I think that's ok. The results from my local run were:

[ 58.33%] ··· qft.LargeQFTMappingBench.time_sabre_swap                        ok
[ 58.33%] ··· ========== ============ ============
              --                 heuristic        
              ---------- -------------------------
               n_qubits   lookahead      decay    
              ========== ============ ============
                 115       307±1ms     296±0.9ms  
                 409       4.93±0s     4.73±0.01s 
                 1081     43.1±0.05s   40.6±0.05s 
              ========== ============ ============

[ 66.67%] ··· qft.LargeQFTMappingBench.track_depth_sabre_swap                 ok
[ 66.67%] ··· ========== =========== ========
              --              heuristic      
              ---------- --------------------
               n_qubits   lookahead   decay  
              ========== =========== ========
                 115         3834      2980  
                 409        26641     25214  
                 1081       150404    120263 
              ========== =========== ========

[ 75.00%] ··· qft.LargeQFTMappingBench.track_size_sabre_swap                  ok
[ 75.00%] ··· ========== =========== =========
              --               heuristic      
              ---------- ---------------------
               n_qubits   lookahead    decay  
              ========== =========== =========
                 115        19670      18634  
                 409        269456     263765 
                 1081      2043833    1959761 
              ========== =========== =========

[ 83.33%] ··· ...uantumVolumeMappingBenchmark.time_sabre_swap                 ok
[ 83.33%] ··· ========== ======= ============ ============
              --                         heuristic        
              ------------------ -------------------------
               n_qubits   depth   lookahead      decay    
              ========== ======= ============ ============
                 115        10    83.5±0.9ms   80.7±0.8ms 
                 115       100     816±20ms     799±3ms   
                 409        10    18.1±0.02s   5.95±0.02s 
                 409       100       n/a          n/a     
                 1081       10    49.1±0.2s     1.28±0m   
                 1081      100       n/a          n/a     
              ========== ======= ============ ============

[ 91.67%] ··· ...olumeMappingBenchmark.track_depth_sabre_swap                 ok
[ 91.67%] ··· ========== ======= =========== =======
              --                      heuristic     
              ------------------ -------------------
               n_qubits   depth   lookahead   decay 
              ========== ======= =========== =======
                 115        10       563       506  
                 115       100       5563      5274 
                 409        10       4371      3676 
                 409       100       n/a       n/a  
                 1081       10      13996     12841 
                 1081      100       n/a       n/a  
              ========== ======= =========== =======

[100.00%] ··· ...VolumeMappingBenchmark.track_size_sabre_swap                 ok
[100.00%] ··· ========== ======= =========== ========
              --                      heuristic      
              ------------------ --------------------
               n_qubits   depth   lookahead   decay  
              ========== ======= =========== ========
                 115        10       4511      4286  
                 115       100      44748     44298  
                 409        10      72015     60308  
                 409       100       n/a       n/a   
                 1081       10      244513    296402 
                 1081      100       n/a       n/a   
              ========== ======= =========== ========

jakelishman · 2022-10-27T17:37:50Z

My hope is that Qiskit/qiskit#9012 will (in fairly short order) knock off a good amount of that time again, and make the cost something more like 10-15 minutes on a run.

mtreinish

The code here LGTM I'm fine with merging this as is. It'd be really great if asv let us measure multiple values in a benchmark and also if we could do a combined timed tracking benchmark. I did leave an inline suggestion for reducing the runtime a bit but please feel free to ignore it and just tag this automerge if you prefer.

mtreinish · 2022-10-27T19:00:06Z

test/benchmarks/qft.py

+class LargeQFTMappingBench:
+    timeout = 600.0  # seconds
+
+    heavy_hex_size = {115: 7, 409: 13, 1081: 21}
+    params = ([115, 409, 1081], ["lookahead", "decay"])
+    param_names = ["n_qubits", "heuristic"]
+
+    def setup(self, n_qubits, _heuristic):
+        qr = QuantumRegister(n_qubits, name="q")
+        self.dag = circuit_to_dag(build_model_circuit(qr))
+        self.coupling = CouplingMap.from_heavy_hex(
+            self.heavy_hex_size[n_qubits]
+        )
+
+    def time_sabre_swap(self, _n_qubits, heuristic):
+        pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
+        pass_.run(self.dag)
+
+    def track_depth_sabre_swap(self, _n_qubits, heuristic):
+        pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
+        return pass_.run(self.dag).depth()
+
+    def track_size_sabre_swap(self, _n_qubits, heuristic):
+        pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1)
+        return pass_.run(self.dag).size()


We could limit the runtime here quite a bit I think if we split this up into two classes one for timed benchmarks and one for tracking benchmarks. The tracking benchmarks could call sabre in setup and then just run depth() and size() on the output. Something like:

class LargeQFTMappingBenchTracking: timeout = 600.0 # seconds heavy_hex_size = {115: 7, 409: 13, 1081: 21} params = ([115, 409, 1081], ["lookahead", "decay"]) param_names = ["n_qubits", "heuristic"] def setup(self, n_qubits, heuristic): qr = QuantumRegister(n_qubits, name="q") self.dag = circuit_to_dag(build_model_circuit(qr)) self.coupling = CouplingMap.from_heavy_hex( self.heavy_hex_size[n_qubits] ) pass_ = SabreSwap(self.coupling, heuristic, seed=2022_10_27, trials=1) self.out_dag = pass_.run(self.dag) def track_depth_sabre_swap(self, _n_qubits, _heuristic): return self.out_dag.depth() def track_size_sabre_swap(self, _n_qubits, _heuristic): return self.out_dag.size()

That way we basically eliminate a bunch of duplicate sabre runs, but it does seem a bit hacky.

That seems fine to me as a workaround for a deficiency in asv. I'll push a commit to do it.

So on further thought, this doesn't actually help in the way we wanted. The setup method is called before every parametrised benchmark, so this doesn't reduce the number of runs. What we can do instead is to define a setup_cache function that creates all the DAGs and calculates the trackers we care about. That "state" object then gets fed into each of the parametrised benchmarks, and we just extract the value we care about to return immediately.

I've done something to this effect in e7c3df9. The result is that asv sits in the "set up" state before the benchmark for quite a long time, but then the benchmarks themselves return instantly (so it's clear that the cache is correctly being reused).

The tracking benchmarks here naively require a recomputation of the expensive swap-mapping, despite use wanting to just reuse things we already calculated during the timing phase. `asv` doesn't let us return trackers from the timing benchmarks directly, but we can still reduce one load of redundancy by pre-calculating all the tracker properties we care about only once in the cached setup method, and then just feeding that state into the actual benchmarks to retrieve the results they care about. This is rather hacky, but does successfully work around functionality we would like in `asv` to reduce runtime.

mtreinish

LGTM, thanks for making the update, in my local test it saved ~5min of execution time

…metapackage#1622) * Add benchmarks for Sabre on large QFT and QV circuits Sabre is capable of handling these large benchmarks now, and it's of interest for us to track our performance on large systems. We don't anticipate running on them yet, but we will want to know in the future when further changes to routing and memory usage improve these benchmarks. * Fix lint * Fix lint properly * Precalculate trackers to avoid recomputation The tracking benchmarks here naively require a recomputation of the expensive swap-mapping, despite use wanting to just reuse things we already calculated during the timing phase. `asv` doesn't let us return trackers from the timing benchmarks directly, but we can still reduce one load of redundancy by pre-calculating all the tracker properties we care about only once in the cached setup method, and then just feeding that state into the actual benchmarks to retrieve the results they care about. This is rather hacky, but does successfully work around functionality we would like in `asv` to reduce runtime.

jakelishman requested review from mtreinish and kevinhartman as code owners October 27, 2022 12:49

Fix lint

11df842

jakelishman force-pushed the more-sabre-benchmarks branch from 66bf11f to 11df842 Compare October 27, 2022 13:15

Fix lint properly

d4d61bd

mtreinish reviewed Oct 27, 2022

View reviewed changes

mtreinish previously approved these changes Oct 27, 2022

View reviewed changes

jakelishman dismissed mtreinish’s stale review via e7c3df9 October 27, 2022 22:38

mtreinish approved these changes Oct 28, 2022

View reviewed changes

mtreinish added the automerge This PR will automatically merge once its CI has passed label Oct 28, 2022

mergify bot merged commit db24b4b into Qiskit:master Oct 28, 2022

jakelishman deleted the more-sabre-benchmarks branch October 28, 2022 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for Sabre on large QFT and QV circuits #1622

Add benchmarks for Sabre on large QFT and QV circuits #1622

jakelishman commented Oct 27, 2022

mtreinish left a comment •

edited

Loading

jakelishman commented Oct 27, 2022

jakelishman commented Oct 27, 2022 •

edited

Loading

mtreinish commented Oct 27, 2022

jakelishman commented Oct 27, 2022

mtreinish left a comment

mtreinish Oct 27, 2022

jakelishman Oct 27, 2022

jakelishman Oct 27, 2022

mtreinish left a comment

Add benchmarks for Sabre on large QFT and QV circuits #1622

Add benchmarks for Sabre on large QFT and QV circuits #1622

Conversation

jakelishman commented Oct 27, 2022

Summary

Details and comments

mtreinish left a comment • edited Loading

Choose a reason for hiding this comment

jakelishman commented Oct 27, 2022

jakelishman commented Oct 27, 2022 • edited Loading

mtreinish commented Oct 27, 2022

jakelishman commented Oct 27, 2022

mtreinish left a comment

Choose a reason for hiding this comment

mtreinish Oct 27, 2022

Choose a reason for hiding this comment

jakelishman Oct 27, 2022

Choose a reason for hiding this comment

jakelishman Oct 27, 2022

Choose a reason for hiding this comment

mtreinish left a comment

Choose a reason for hiding this comment

mtreinish left a comment •

edited

Loading

jakelishman commented Oct 27, 2022 •

edited

Loading