Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RB module for future extensions #898

Merged
merged 21 commits into from
Sep 12, 2022

Conversation

itoko
Copy link
Contributor

@itoko itoko commented Aug 24, 2022

Summary

This commit improves the readability and extensibility of RB module implementation without API change nor performance regression.

Details and comments

@itoko
Copy link
Contributor Author

itoko commented Aug 26, 2022

Simple benchmarking results: Suggesting no performance regression in this commit.

1Q StandardRB (lengths = np.arange(1, 1000, 100), num_samples = 10):
- circuits()
this commit: 1.72 s ± 57.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 4.97 s ± 56.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- _transpile_circuits()
this commit: 16.5 s ± 163 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 20.8 s ± 333 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2Q StandardRB (lengths = np.arange(1, 200, 20), num_samples = 10):
- circuits()
this commit: 995 ms ± 565 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 1.94 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- _transpile_circuits()
this commit: 6.84 s ± 303 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 8.04 s ± 163 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1Q InterleavedRB (lengths = np.arange(1, 1000, 100), num_samples = 10):
- circuits()
this commit: 5.76 s ± 174 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 10.8 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- _transpile_circuits()
this commit: 54.6 s ± 830 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 55.7 s ± 568 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2Q InterleavedRB (lengths = np.arange(1, 200, 20), num_samples = 10):
- circuits()
this commit: 2.07 s ± 20.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 4.11 s ± 86.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- _transpile_circuits()
this commit: 15.3 s ± 476 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
main branch: 15.3 s ± 136 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Benchmarking code:

import numpy as np
from qiskit_experiments.library import StandardRB, InterleavedRB
from qiskit.circuit.library.standard_gates import SXGate
# 1Q StandardRB
rb = StandardRB(qubits=[0], lengths=np.arange(1, 1000, 100), num_samples=10, seed=1010)
# 2Q StandardRB
rb = StandardRB(qubits=[0, 1], lengths=np.arange(1, 200, 20), num_samples=10, seed=1010)
# 1Q InterleavedRB
rb = InterleavedRB(SXGate(), qubits=[0], lengths=np.arange(1, 1000, 100), num_samples=10, seed=1010)
# 2Q InterleavedRB
rb = InterleavedRB(SXGate(), qubits=[0, 1], lengths=np.arange(1, 200, 20), num_samples=10, seed=1010)
%timeit -n 1 rb.circuits()
%timeit -n 1 rb._transpiled_circuits()

return _clifford_2q_int_to_instruction(elem)
return elem.to_instruction()

def _identity_clifford(self) -> SequenceElementType:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving all the integer-based Clifford encoding methods to clifford_utils.py so the user of the experiment is left exposed to the implementation details of the optimization as little as possible and the developers have all the optimization-related code in one place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This apply also to _clifford_compose and _clifford_adjoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good point we need to think of.
IMO, clifford_utils should be useful for optimizing the implementation of StarndardRB but it also should be independent of the optimization strategy of StarndardRB (i.e. using integers for 1- and 2-qubit Cliffords) as much as possible. For example, we might end up with using integers only for 1-qubit Cliffords.
So I intentionally keep the minimal codes to tell the optimization strategy here while putting the details of the utility functions used for optimization to clifford_utils (e.g. see _to_instruction). I agree that we should not include all of the implementation details here in StarndardRB but I think we should also need to let the StarndardRB developers know how the code is optimized within StarndardRB. (For example, I think we should have clifford_1q_compose or something function/table in clifford_utils but we should call it in StarndardRB). If we put all the details including the optimization strategy to clifford_utils, the developers must read and maintain both StarndardRB and clifford_utils to understand and update the optimization strategy (that would decrease the readability and maintenability).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Eli's comment above regarding CliffordUtils. In my implementation, in PR #892 , I actually moved more functionality to CliffordUtils. I also added to class parameters to its init - num_qubits and basis_gates. I thought it looked cleaner that way, because StandardRB now creates a CliffordUtils object once, and CliffordUtils then takes care of all the details, such as creating the transpiled circuits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Itoko, I understand. My hope though was that the flow of the experiment itself, StandardRB in this case, would not have to change even if we change the implementation details of the optimization. So if there is a way to have the experiment code work with Cliffords (e.g. random sampling, composing, inverse) in a somewhat abstract way, without caring about this or that implementation detail of the optimization, it would be much cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see I can try to put a bit more logics (around integer Clifford operations) into clifford_utils. However, I'd like to push such a refactoring of clifford_utils into a follow-up PR. I now understand this PR should focus on the refactoring of StandardRB/InterleavedRB. To make that clear, I'll remove the API changes in clifford_utils from this PR (i.e. separate it out to a follow-up PR). Does it make sense?
In the follow-up PR, pushing Merav's idea further, I'd like to discuss a new class something like IndexedClifford(num_qubits, index) that implements adjoint and compose methods like the regular Clifford. After this PR, such a further refactoring would become much easier. (e.g. we can remove _identity_clifford, _clifford_adjoint and _clifford_compose whenever we want since they are all private methods.)

Copy link
Contributor Author

@itoko itoko Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted API changes in clifford_utils in 6759eaf

@itoko itoko marked this pull request as ready for review August 30, 2022 04:27
if k1 == 1: # V gate
qc.sdg(1)
qc.h(1)
if k1 == 2: # W gate (replacing these with qc.h(1) + qc.s(1) causes a test failure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea which test fails and why? there should be no difference between the case where k0=2 and k1=2 (both should append a W=HS gate)

Copy link
Contributor Author

@itoko itoko Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test: https://github.com/Qiskit/qiskit-experiments/blob/622554d475f1b0ee7c20941e1ad79613a51771e7/test/library/randomized_benchmarking/test_randomized_benchmarking.py#L579
(See also https://github.com/Qiskit/qiskit-experiments/runs/8031956851?check_suite_focus=true for the error message)
Honestly, I have no idea... I'd like to ask it to the person who wrote the original (test) code. I tried to replace the W with HS and noticed this wired behavior. And for safety, I just leave it as is...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's better to fix the test, since the code should definitely be correct (maybe it's only a matter of error bounds?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the test following @nkanazawa1989 's suggestion in 1972130.

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @itoko I like new code. This is well organized and much readable. I'm also interested in how this new implementation can improve coverage of RB variants. For example, how can we write CNOT-Dihedral RB by overriding the Standard RB?

inv = self.__adjoint_clifford(prev_elem)

circ.append(self._to_instruction(inv), qubits)
circ.barrier(qubits) # TODO: Can we remove this? (measure_all inserts one more barrier)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in c577461

def _transpiled_circuits(self) -> List[QuantumCircuit]:
"""Return a list of experiment circuits, transpiled."""
# TODO: Custom transpilation (without calling transpile()) for 1Q and 2Q cases
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to integrate qubit-aware basis gate count into circuit generation? Actually this counting logic inserts another loop and the transpile slower. This count information is only used by the Standard RB (for EPC to EPG conversion), but if the performance regression is quite small I think having basis gate counts in metadata of other types of RB circuits is kind of nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the part is currently a bottleneck. It would not be too late to think of that after we clearly see the part is a bottleneck.

circuits = []
for i, seq in enumerate(sequences):
if self._full_sampling or i % len(self.experiment_options.lengths) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be calling this per sample (seed), rather than generating a big single sequence of all seeds. This makes logic simpler and allows us to use multi threading in future updates.

Copy link
Contributor Author

@itoko itoko Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as an option for future. Note for the future:

        def split_list(org_list, num_sublist):
            for idx in range(0, len(org_list), num_sublist):
                yield org_list[idx:idx + num_sublist]

        circuits = []
        for sub_sequences in split_list(sequences, len(self.experiment_options.lengths)):
            prev_elem, prev_seq = self.__identity_clifford(), []
            for seq in sub_sequences:
                if self._full_sampling:
                    prev_elem, prev_seq = self.__identity_clifford(), []
                    ...

@itoko
Copy link
Contributor Author

itoko commented Sep 5, 2022

Regarding CNOTDihedral, I think it should be implemented independently. At first I thought I could design the refactored StandardRB/InterleavedRB reusable for CNOTDihedral classes, but I had second thought. Now I think CNOTDihedral classes should not inherit StandardRB, because CNOTDihedral uses a different group to make different circuits and requires different optimizations of implementation. That said, I think the code structure of the refactored StandardRB/InterleavedRB should be a good reference for implementing new independent CNOTDihedral classes.

@nkanazawa1989
Copy link
Collaborator

nkanazawa1989 commented Sep 5, 2022

Understood, but then I prefer to have abstract base class to restrict the flow of sequence generation for other RB variants. If some contributor writes CNOT-Dihedral RB with their research code, this would be really hard to review. I think some extra logic you introduced here can be reused for such RB variants, and we can save some boilerplate code with well-designed base class.

(edit)
I'm fine with reviewing separate PR for base class, and approving this PR.

Co-authored-by: Naoki Kanazawa <[email protected]>
@itoko
Copy link
Contributor Author

itoko commented Sep 7, 2022

I think it would be better to examine such kinds of refactoring for other RB variants later in a separate PR if necessary. (I think it would be beyond the scope of this PR)

@itoko
Copy link
Contributor Author

itoko commented Sep 8, 2022

I've updated the codes based on your first review comments, could you take a second look? @nkanazawa1989

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Itoko-san. This almost looks good to me. Just a small suggestion for unittest. IIRC, the main changes with this PR are

  • New logic samples sequences, not directly samples circuits.
  • New logic _sequences_to_circuits are added to convert sampled sequences to circuits.

I think this is a good ground work for the refactoring of RB speedup. This class newly get many private methods so that we can easily tune the performance of extracted logic in follow up. I will hold the review for the new class design until we complete the whole work. I hope you will eventually make cleanup PR to finalize the design with some user guide for developers of RB variants.

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks @itoko

@merav-aharoni
Copy link
Contributor

It looks good to me as well. Two small comments:

  1. I agree that you can remove the final barrier in each circuit.
  2. One we have the Integral support (which I am adding), do we want to preserve the Clifford version as well? It seems unnecessary to me.

@itoko
Copy link
Contributor Author

itoko commented Sep 8, 2022

  1. I'll fix the duplication of the final barrier (no one is against so far)
  2. I think we still need the Clifford version for RB with 3 or more qubits even after integer Cliffords are supported for 1 and 2 qubit cases.

@nkanazawa1989 nkanazawa1989 merged commit 0359f4c into qiskit-community:main Sep 12, 2022
@itoko itoko mentioned this pull request Nov 29, 2022
itoko added a commit that referenced this pull request Dec 23, 2022
Improve the performance of transpiled circuit generation in 1Q/2Q StandardRB/InterleavedRB (about 10x speedup in 1Q SRB/IRB and 5x speedup in 2Q SRB/IRB in most cases).

Including following feature pull-requests:
* New algorithm for RB building Cliffords by layers (#892)
* Improve custom transpilation for faster 1Q/2Q RB (#922)
* Improve integer-based Clifford operations for 1Q/2Q RB (#940)
* Readd support for backends with directed basis gates to RB experiments (#960)

Features other than speedup:
* Fix performance regression in 3 or more qubits RB (embedded at Refactor RB module for future extensions #898)
* Improve validation of interleave_element in InterleavedRB
* Restructure tests for RB module
* Remove the barriers in front of the first Cliffords in RB circuits

Co-authored-by: merav-aharoni <[email protected]>
Co-authored-by: Naoki Kanazawa <[email protected]>
itoko added a commit that referenced this pull request Jun 2, 2023
### Summary
Changes the ordering of circuits generated by `InterleavedRB` back to
RIRIRI (R: Reference, I: Interleaved) order. It was accidentally changed
into RRRIII order in #898. Before that, it had been RIRIRI order.

### Details and comments
This change may slightly alter the impact of noise on IRB results.
RIRIRI ordering is preferable in the sense that it minimizes the impact
of drift or noise fluctuations. Note that `InterleavedRBAnalysis` is not
changed that means it works correctly regardless of the circuit ordering
because it processes data relying on "interleaved" flag in circuit
metadata.

This commit also adds a new experiment option `circuit_order` to
`InterleavedRB`. It enables to change the order of the reference and the
interleaved circuits and hence slightly alter the impact of noise on
interleaved RB results. The default value is set to `"RIRIRI"`.
mergify bot pushed a commit that referenced this pull request Jun 2, 2023
### Summary
Changes the ordering of circuits generated by `InterleavedRB` back to
RIRIRI (R: Reference, I: Interleaved) order. It was accidentally changed
into RRRIII order in #898. Before that, it had been RIRIRI order.

### Details and comments
This change may slightly alter the impact of noise on IRB results.
RIRIRI ordering is preferable in the sense that it minimizes the impact
of drift or noise fluctuations. Note that `InterleavedRBAnalysis` is not
changed that means it works correctly regardless of the circuit ordering
because it processes data relying on "interleaved" flag in circuit
metadata.

This commit also adds a new experiment option `circuit_order` to
`InterleavedRB`. It enables to change the order of the reference and the
interleaved circuits and hence slightly alter the impact of noise on
interleaved RB results. The default value is set to `"RIRIRI"`.

(cherry picked from commit c6753f3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants