-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port circuit_to_dag
to Rust
#13036
Port circuit_to_dag
to Rust
#13036
Conversation
34ba18d
to
923ae3f
Compare
There are still some bugs to fix but so far benchmarks look promising for this: All benchmarks:
| Change | Before [cc87318f] <main> | After [24df33df] <rusty-circ-to-dag> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| - | 17.3±0.2μs | 7.56±0.09μs | 0.44 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8) |
| - | 29.3±0.4μs | 11.1±0.1μs | 0.38 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8) |
| - | 11.9±0.1ms | 3.67±0.1ms | 0.31 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128) |
| - | 283±2μs | 84.9±0.4μs | 0.30 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8) |
| - | 191±0.8μs | 55.1±0.5μs | 0.29 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8) |
| - | 62.3±0.2μs | 18.0±0.1μs | 0.29 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8) |
| - | 882±10μs | 259±3μs | 0.29 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8) |
| - | 106±0.8μs | 30.4±0.4μs | 0.29 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8) |
| - | 434±8μs | 120±2μs | 0.28 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8) |
| - | 169±3μs | 44.2±0.4μs | 0.26 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128) |
| - | 34.5±0.3ms | 8.74±0.2ms | 0.25 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| - | 6.00±0.03ms | 1.50±0.03ms | 0.25 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128) |
| - | 9.80±0.2ms | 2.27±0.2ms | 0.23 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192) |
| - | 2.20±0ms | 502±6μs | 0.23 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128) |
| - | 3.44±0.05ms | 805±4μs | 0.23 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128) |
| - | 76.6±0.8ms | 17.3±0.3ms | 0.23 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192) |
| - | 2.44±0.01ms | 531±4μs | 0.22 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048) |
| - | 18.5±0.1ms | 4.05±0.1ms | 0.22 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048) |
| - | 351±1μs | 72.6±1μs | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128) |
| - | 47.6±0.9ms | 9.96±0.4ms | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192) |
| - | 1.24±0.01ms | 260±5μs | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128) |
| - | 20.6±0.09ms | 4.10±0.2ms | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192) |
| - | 755±2μs | 147±0.08μs | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128) |
| - | 5.14±0.06ms | 880±4μs | 0.17 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048) |
| - | 11.6±0.09ms | 2.01±0.05ms | 0.17 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048) |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED. |
4cc652c
to
da3b3ae
Compare
After fixing all of the failing tests, there was a bit of a performance regression (probably from cloning all the instructions, which I don't think is needed anymore if we do things through rust). But it's still faster than the original. All benchmarks:
| Change | Before [cc2edc9f] <main> | After [7b765db2] <rusty-circ-to-dag> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| - | 17.2±0.1μs | 12.0±0.3μs | 0.70 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8) |
| - | 61.4±0.3μs | 37.9±0.5μs | 0.62 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8) |
| - | 190±1μs | 115±0.7μs | 0.61 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8) |
| - | 29.2±0.3μs | 17.9±0.2μs | 0.61 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8) |
| - | 277±0.3μs | 168±1μs | 0.61 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8) |
| - | 107±4μs | 65.0±1μs | 0.61 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8) |
| - | 425±2μs | 243±8μs | 0.57 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8) |
| - | 20.6±0.1ms | 11.3±0.2ms | 0.55 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192) |
| - | 869±6μs | 459±1μs | 0.53 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8) |
| - | 34.4±2ms | 17.9±0.5ms | 0.52 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| - | 48.5±0.8ms | 24.7±0.5ms | 0.51 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192) |
| - | 3.35±0.06ms | 1.67±0.04ms | 0.50 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128) |
| - | 11.8±0.04ms | 5.86±0.1ms | 0.50 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128) |
| - | 78.7±0.9ms | 39.5±0.6ms | 0.50 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192) |
| - | 363±8μs | 178±0.9μs | 0.49 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128) |
| - | 5.84±0.06ms | 2.84±0.05ms | 0.49 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128) |
| - | 2.23±0.1ms | 1.06±0.02ms | 0.48 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128) |
| - | 11.9±0.3ms | 5.66±0.2ms | 0.48 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048) |
| - | 18.7±0.4ms | 8.94±0.3ms | 0.48 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048) |
| - | 10.5±0.2ms | 4.90±0.3ms | 0.47 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192) |
| - | 5.14±0.06ms | 2.34±0.07ms | 0.46 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048) |
| - | 755±2μs | 349±2μs | 0.46 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128) |
| - | 1.23±0.02ms | 569±3μs | 0.46 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128) |
| - | 168±2μs | 76.0±4μs | 0.45 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128) |
| - | 2.52±0.05ms | 976±9μs | 0.39 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048) |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED. |
One or more of the following people are relevant to this code:
|
circuit_to_dag
to Rustcircuit_to_dag
to Rust
Pull Request Test Coverage Report for Build 10799042871Details
💛 - Coveralls |
7b765db
to
5b0ce5e
Compare
8a46a04
to
71c8aa5
Compare
crates/circuit/src/dag_circuit.rs
Outdated
Ok(new_nodes) | ||
} | ||
|
||
#[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to not just take QuantumCircuitData
as the input here? That would let you ignore this clippy and you don't have to pull out each field individually and then each argument could be set by name in the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that it would add some versatility to just accept the arguments, but adding it to the strctu seems to make more sense.
679ac61
to
7fe0a36
Compare
After @mtreinish suggestions, the copying of instructions one by one is more effective. All benchmarks:
| Change | Before [3d4bab2b] <main> | After [7fe0a365] <rusty-circ-to-dag> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| - | 15.3±0.05μs | 7.85±0.2μs | 0.51 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8) |
| - | 26.1±0.1μs | 11.7±0.03μs | 0.45 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8) |
| - | 147±2μs | 61.3±0.3μs | 0.42 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8) |
| - | 222±5μs | 93.4±0.4μs | 0.42 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8) |
| - | 81.7±0.8μs | 33.2±0.2μs | 0.41 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8) |
| - | 50.0±0.1μs | 20.2±0.07μs | 0.40 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8) |
| - | 346±1μs | 130±0.4μs | 0.38 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8) |
| - | 703±4μs | 271±0.7μs | 0.38 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8) |
| - | 9.38±0.08ms | 3.23±0.02ms | 0.34 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128) |
| - | 4.38±0.02ms | 1.38±0.01ms | 0.32 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128) |
| - | 2.46±0.02ms | 719±3μs | 0.29 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128) |
| - | 1.57±0.01ms | 436±4μs | 0.28 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128) |
| - | 145±4μs | 38.7±0.3μs | 0.27 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128) |
| - | 24.4±0.1ms | 6.38±0.2ms | 0.26 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| - | 240±1μs | 63.5±0.4μs | 0.26 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128) |
| - | 867±1μs | 226±1μs | 0.26 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128) |
| - | 531±0.6μs | 129±0.3μs | 0.24 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128) |
| - | 52.1±0.3ms | 11.9±0.2ms | 0.23 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192) |
| - | 2.13±0.04ms | 446±4μs | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048) |
| - | 8.46±0.2ms | 1.76±0.04ms | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192) |
| - | 13.1±0.08ms | 2.78±0.01ms | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048) |
| - | 3.54±0.06ms | 718±2μs | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048) |
| - | 13.8±0.07ms | 2.71±0.02ms | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192) |
| - | 7.86±0.06ms | 1.55±0.01ms | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048) |
| - | 31.6±0.2ms | 6.44±0.03ms | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192) |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great, I'm very eager to have this as besides being a speedup for conversion it gives us a native method for working with control flow block in the transpiler with minimal python overhead. My main concern is about the extra overhead with building the qubit mapping stuff when we don't have a mapping. I get from a code cleanliness perspective it's a lot easier to have an identity map. But I'm worried about the performance overhead to handle the mapping when there isn't any both in building the maps and then also adding hash lookups on every qubit..
let qc_data = qc.data; | ||
let num_qubits = qc_data.num_qubits(); | ||
let num_clbits = qc_data.num_clbits(); | ||
let num_ops = qc_data.__len__(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I feel like we should have a native len()
method to CircuitData
. Not for this PR of course, it's just a bit weird to see this in rust code.
- Introduce a method that adds a chain of `PackedInstruction` continuously avoiding the re-linking of each bit's output-node until the very end of the iterator. - TODO: Add handling of vars - Add header for a `from_iter` function that will create a `DAGCircuit` based on a chain of `PackedInstruction`.
- Fix incorrect re-insertion of last_node.
- Remove `from_iter`
- Use `entry` api to either modify or insert a value if missing.
- A bug that adds duplicate edges to vars has been temporarily fixed. However, the root of this problem hasn't been found yet. A proper fix is pending. For now skip those instances.
- Introduce a method that adds a chain of `PackedInstruction` continuously avoiding the re-linking of each bit's output-node until the very end of the iterator. - TODO: Add handling of vars - Add header for a `from_iter` function that will create a `DAGCircuit` based on a chain of `PackedInstruction`.
…circuit` crate. - Make the `CircuitData` iter method be an exact-size iterator.
- Revert the changes making the interners and registers visible to the crate `qiskit-circuit`. - Create methods to expose immutable borrowed views of the interners, registers and global_phase to prevent from mutating the DAGCircuit. - Add `get_qargs` and `get_cargs` to unpack interned qargs ans cargs. - Other tweaks and fixes.
- Correct incorrect docstrings for `qubits()` and `clbits()` Co-authored-by: Eli Arbel <[email protected]>
- Add new method `DAGCircuit::from_quantum_circuit` which uses the data from a `QuantumCircuit` instance to build a dag_circuit. - Expose the method through a `Python` interface with `circuit_to_dag` which goes by the alias of `core_circuit_to_dag` and is called by the original method. - Add an arbitrary structure `QuantumCircuitData` that successfully extract attributes from the python `QuantumCircuit` instance and makes it easier to access in rust. - This structure is for attribute extraction only and is not clonable/copyable. - Expose a new module `converters` which should store all of the rust-side converters whenever they get brought into rust. - Other small tweaks and fixes.
- Add more accurate estimate of num_edges.
- When converting from `QuantumCircuit` to `DAGCircuit`, we were previously copying the instances of `BitData` by cloning. This would result in instances that had extra qubits that went unused. This commit fixes this oversight and reduced the amount of failing times from 160 to 12.
- Use `copy_operations` to copy all of the operations obtained from `CircuitData` by calling deepcopy. - Initialize `._data` manually for instances of `BlueprintCircuit` by calling `._build()`. - Other small fixes.
- Previously, the binding of qubits and clbits into bitdata was done based on the `push_back()` function behavior. This manual mapping was removed in favor of just inserting the qubits/clbits using the `add_qubits()` and `add_clbits()` methods and keeping track of the new indices. - Remove cloning of interners, use the re-mapped entries instead. - Remove manual re-mapping of qubits and clbits. - Remove cloning of BitData, insert qubits directly instead. - Other small tweaks and fixes.
- Make `QuantumCircuitData` extraction struct private. - Rename `DAGCircuit::from_quantum_circuit` into `DAGCircuit::from_circuit` to make more generalized. - Pass all attributes of `QuantumCircuit` that are passed from python as arguments to the `DAGCircuit::from_circuit` function. - Add `DAGCircuit::from_circuit_data` as a wrapper to `from_circuit` to create an instance solely from the properties available in `CircuitData`. - Use `DAGCircuit::from_circuit` as base for `DAGCircuit::from_circuit_data`.
- `QuantumCircuitData` is an extractor helper built only to extract attributes from a Python-based `Quantumircuit` that are not yet available in rust + the qualities that already are through `CircuitData`. - Add `Clone` trait `QuantumCircuitData` as all its properties are clonable. - Make `circuit_to_dag` public in rust in case it needs to be used for a transpiler pass. - Adapt to renaming of `DAGCircuit::add_from_iter` into `DAGCircuit::extend`.
- Use `QuantumCircuitData` as argument for `DAGCircuit::from_circuit`. - Create instance of `QuantumCircuitData` in `DAGCircuit::from_circuit_data`. - Re-add duplicate skip in extend. - Other tweaks and fixes.
- Remove second re-mapping of bits, perform this mapping at insertion of qubits instead. - Modify insertion of qubits to re-map the indices and store them after insertion should a custom ordering be provided. - Remove extra `.and_modify()` for `clbits_last_node` from `DAGCircuit::extend`. - Remove submodule addition to `circuit` module, submodule is now under `_accelerate.converters`. - Other tweaks and fixes.
7fe0a36
to
fa34c5a
Compare
All benchmarks:
| Change | Before [3d4bab2b] <main> | After [94f8ab42] <rusty-circ-to-dag> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|----------------------------------------|---------|--------------------------------------------------------------|
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8192) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 2048) |
| | n/a | n/a | n/a | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8192) |
| - | 15.0±0.4μs | 7.02±0.02μs | 0.47 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8) |
| - | 25.6±0.2μs | 10.9±0.09μs | 0.43 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8) |
| - | 217±0.6μs | 88.0±0.2μs | 0.41 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8) |
| - | 82.3±2μs | 33.0±2μs | 0.40 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8) |
| - | 145±0.7μs | 56.4±0.4μs | 0.39 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8) |
| - | 49.5±0.4μs | 18.8±0.3μs | 0.38 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8) |
| - | 342±10μs | 123±1μs | 0.36 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8) |
| - | 711±1μs | 259±5μs | 0.36 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8) |
| - | 9.27±0.07ms | 2.95±0.04ms | 0.32 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128) |
| - | 4.39±0.03ms | 1.34±0.02ms | 0.31 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128) |
| - | 2.43±0.01ms | 689±3μs | 0.28 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128) |
| - | 1.58±0.02ms | 417±1μs | 0.26 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128) |
| - | 145±1μs | 34.8±0.2μs | 0.24 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128) |
| - | 24.1±0.1ms | 5.68±0.03ms | 0.24 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048) |
| - | 236±2μs | 57.1±0.3μs | 0.24 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128) |
| - | 883±6μs | 216±3μs | 0.24 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128) |
| - | 529±3μs | 119±0.5μs | 0.22 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128) |
| - | 51.6±0.4ms | 10.9±0.2ms | 0.21 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192) |
| - | 8.17±0.05ms | 1.61±0.03ms | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192) |
| - | 13.3±0.2ms | 2.66±0.04ms | 0.20 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048) |
| - | 2.06±0.01ms | 395±6μs | 0.19 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048) |
| - | 31.6±0.3ms | 5.98±0.07ms | 0.19 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192) |
| - | 3.51±0.03ms | 648±4μs | 0.18 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048) |
| - | 13.8±0.2ms | 2.42±0.02ms | 0.18 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192) |
| - | 7.84±0.08ms | 1.41±0.01ms | 0.18 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048) |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED. After @mtreinish 's suggestions there was a slight performance increase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great, thanks for the fast update. Avoiding the extra overhead of an identity mapping definitely is worth it and the code looks much simpler now that everything is in a single collection. I just had one inline question/suggestion that I think could improve performance a bit more at the cost of some code duplication, but I'm curious what you think.
let instructions: Vec<PackedInstruction> = qc_data | ||
.iter() | ||
.map(|instr| -> PyResult<PackedInstruction> { | ||
// Re-map the qubits | ||
let new_qargs = if let Some(qubit_mapping) = &qubit_map { | ||
let qargs = qc_data | ||
.get_qargs(instr.qubits) | ||
.iter() | ||
.map(|bit| qubit_mapping[bit.0 as usize]) | ||
.collect(); | ||
new_dag.qargs_interner.insert_owned(qargs) | ||
} else { | ||
new_dag | ||
.qargs_interner | ||
.insert(qc_data.get_qargs(instr.qubits)) | ||
}; | ||
// Remap the clbits | ||
let new_cargs = if let Some(clbit_mapping) = &clbit_map { | ||
let qargs = qc_data | ||
.get_cargs(instr.clbits) | ||
.iter() | ||
.map(|bit| clbit_mapping[bit.0 as usize]) | ||
.collect(); | ||
new_dag.cargs_interner.insert_owned(qargs) | ||
} else { | ||
new_dag | ||
.cargs_interner | ||
.insert(qc_data.get_cargs(instr.clbits)) | ||
}; | ||
// Copy the operations | ||
|
||
Ok(PackedInstruction { | ||
op: if copy_op { | ||
instr.op.py_deepcopy(py, None)? | ||
} else { | ||
instr.op.clone() | ||
}, | ||
qubits: new_qargs, | ||
clbits: new_cargs, | ||
params: instr.params.clone(), | ||
extra_attrs: instr.extra_attrs.clone(), | ||
#[cfg(feature = "cache_pygates")] | ||
py_op: OnceCell::new(), | ||
}) | ||
}) | ||
.collect::<PyResult<Vec<_>>>()?; | ||
|
||
// Finally add all the instructions back | ||
new_dag.extend(py, instructions)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only reason you're collecting into a vec here because the deepcopy()
call is fallible. I was kind of excited to have this all work with iterators in rust so we avoided a big allocation like this. I don't think we want to mask the error in python if a deepcopy fails, that is important for users to get the python exception. While it ends up being a big block of duplicated code, do you think it would make sense to move the copy_op
conditional outside the iterator so you do something like:
if copy_op {
let instructions = iterator.... .collect::<PyResult<Vec<_>>>()
} else {
new_dag.extend(py, iterator)
}
where iterator is basically this entire chain just with or without the pyresult depending on whether we call clone()
or deepcopy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reason we're re-collecting is to re-intern the qargs. But I also believe this could be avoided. After all the only requirement here is that whatever we send to extend()
can turn into an iterator of PackedInstruction. I think we can avoid the allocation by not collecting the Vec
but we would still have to re-intern things since we're not copying the interners over.
Edit: This earlier suggestion may not work due to the iterator mutably borrowing the dag while the method already holds right to mutate. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this without re-interning at all in the case when we're not reordering any qubits or clbits. Since the interners should be the same for the dag and circuit.
But, lets save this as a potential followup then. The gains from doing something like this would be be measurable but even with a collection into a Vec this has a good improvement in speed and the logic to figure out how to do this is probably will probably be pretty verbose to handle all the edge cases around whether we are reordering any bits or not, or we need to return a python exception from deepcopy or not. I'll open an issue after this merges so we can track it as a potential future optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit/question inline but this LGTM thanks for doing this. If you don't want to update the nit feel free to enqueue this for merge.
Co-authored-by: Matthew Treinish <[email protected]>
Is tracked by and resolves #13001 and #13004
Summary
These commits bring
circuit_to_dag
into rust, leveraging all of the api's introduced by #12550. This also resulted in the addition of a rust-native method.DAGCircuit::from_circuit_data
to convert fromCircuitData
toDAGCircuit
from Rust alone.Details and comments
The following commits aim to add the circuit converter
circuit_to_dag
to rust while leveraging the rust-side logic forDAGCircuit
. This is originally used to turn an instance ofQuantumCircuit
intoDAGCircuit
. A rust method to convertCircuitData
toDAGCircuit
was also added.Inital changes
DAGCircuit::from_quantum_circuit
which uses the data from aQuantumCircuit
instance to build a dag_circuit.CircuitData
interners and registers to theqiskit-circuit
crate #13006 which exposes theInterner
s andBitData
instances for qubits and clbits from theCircuitData
struct for public view.DAGCircuit
instances has an estimated capacity that should be enough for mostDAGCircuit
instances with the number of operations provided. Uses Implement a capacity-allocated constructor forDAGCircuit
in Rust. #12975.CircuitData
instance is added onto theDAGCircuit
by using.add_from_iter()
(introduced by Add method to add instructions to a DAGCircuit from an iterator of PackedInstruction #13032).DAGCircuit::from_circuit_data
uses the previously describe method to create aDAGCircuit
based only on theCircuitData
from a circuit.CircuitData != QuantumCircuit
and results may differ.Python
interface withcircuit_to_dag
which goes by the alias ofcore_circuit_to_dag
and is called by the original method.QuantumCircuitData
that extract attributes from the pythonQuantumCircuit
instance and makes it easier to access in rust.converters
which should store all of the rust-side converters whenever they get brought into rust.Tasks
Blockers
DAGCircuit
in Rust. #12975CircuitData
interners and registers to theqiskit-circuit
crate #13006Questions and suggestions
Feel free to ask any questions and give any feedback below, thank you! 🚀