Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conjugate reduction in optimize annotated pass #11811

Merged
merged 17 commits into from
May 2, 2024

Conversation

alexanderivrii
Copy link
Contributor

@alexanderivrii alexanderivrii commented Feb 15, 2024

Summary

Close #11810.

This PR adds a new optimization to the OptimizeAnnotated transpiler pass. The reduction looks for annotated operations (objects of type AnnotatedOperation that consist of a base operation B and a list M of control, inverse and power modifiers) with the following properties:

  • the base operation needs to be synthesized (i.e. it's not already supported by the target or belongs to the equivalence library)
  • the definition circuit for B can be expressed as P -- Q -- R with R = P^{-1}.

For such annotated operations, we can move the modifiers to be over the Q-part only, i.e. replace M - [P -- Q -- R] by P -- M-[Q] -- R.

Example: Controlled Draper QFT Adder

This has been one of the main motivating examples. Consider the controlled_adder circuit constructed below:

basis_gates = ['cx', 'id', 'u']
equivalence_library = SessionEquivalenceLibrary
num_state_qubits = 3
num_controls = 2

adder = DraperQFTAdder(num_state_qubits=num_state_qubits, kind="half")
controlled_adder = adder.control(num_controls)
synthesized_adder = HighLevelSynthesis(basis_gates=basis_gates, equivalence_library=equivalence_library)(controlled_adder)

The synthesized adder has the following operations: [('cx', 746), ('p', 630), ('cp', 237), ('h', 84), ('ry', 16), ('ccx', 16)]. (Note that HighLevelSynthesis is the transpiler pass that within transpile that decomposes the circuit into 1- and 2-qubit gates).

On the other hand, the controlled adder is of the form control - [QFT -- U -- QFT^{-1}], which can be simplified to
QFT -- control-[U] -- IQFT. By removing the controls over QFT`` and ``IQFT parts of the circuit, one obtains significantly fewer gates in the transpiled circuit. The following is still not 100% perfect but showcases this reduction:

adder = DraperQFTAdder(num_state_qubits=num_state_qubits, kind="half")
adder = adder.decompose().decompose().decompose()
controlled_adder = adder.control(num_controls, annotated=True)
optimized_adder = OptimizeAnnotated(basis_gates=basis_gates, equivalence_library=equivalence_library)(controlled_adder)
synthesized_adder = HighLevelSynthesis(basis_gates=basis_gates, equivalence_library=equivalence_library)(optimized_adder)

Now the synthesized adder has the following operations: [('cx', 306), ('p', 270), ('cp', 93), ('h', 44)].

Example: CSWAP gate

CSwapGate in Qiskit has the following definition:

q = QuantumRegister(3, "q")
qc = QuantumCircuit(q, name=self.name)
rules = [
    (CXGate(), [q[2], q[1]], []),
    (CCXGate(), [q[0], q[1], q[2]], []),
    (CXGate(), [q[2], q[1]], []),
]
for instr, qargs, cargs in rules:
    qc._append(instr, qargs, cargs)

self.definition = qc

We get exactly the same decomposition for SwapGate().control(1, annotated=True) by writing Swap as 3 CXs and moving the control to the middle CX (as the two outer CXs are inverse of each other).

Details and comments

There are many interesting discussion points related to all this.

There is still a question of how the OptimizeAnnotated pass should be integrating into the transpile flow; I am now exploring several options.

The key part of the PR is the code (currently in optimize_annotated.py) that matches nodes at the front of a DAGCircuit with nodes at the back of the DAGCircuit; currently the check is simply fn.op == bn.op.inverse(); inverse nodes are collected into P and R respectively. Would something like this be uesful in other contexts?

We want to have a more general function that checks whether two DAG nodes A and B are inverse of each other. Currently I am checking that A.qargs = B.qargs. A.cargs = B.cargs and A.op = B.op.inverse(). What is the most general check we could do? For instance, we could relax to check to set(A.qargs) = set(B.qargs) but that would also require checking inverse up to a specific permutation of qubits, and that seems hard. In addition, the check A.op = B.op.inverse() is very constrained: it does detect that S is the inverse of Sdg or that CX is the (self-) inverse of CX, but it does not detect that QFT is the inverse of IQFT in the Draper adder example, because in draper_qft_adder.py the inverse-QFT is defined as QFT(num_qubits_qft, do_swaps=False).inverse().to_gate() and not QFT(num_qubits_qft, do_swaps=False).to_gate().inverse(). This is one of the reasons that we needed to decompose the circuit 3 times before the optimization became applicable. It would also currently not detect that SomeGate.inverse(annotated=True) is the inverse of SomeGate.

Yet another problem with DraperQFTAdder(num_state_qubits=num_state_qubits, kind="half").control(num_controls, annotated=True) is that the definition of the base operation (aka of DraperQFTAdder) is not yet of the form QFT -- U -- IQFT, but a circuit consisting of only a single gate, whose definition in turn is of the form above. So again the optimization does apply without decomposing the circuit a few times.

Also please take a look at how I have implemented the circuit P - M-[Q] - R with modifiers M moved to the middle part. It's quite ugly, and the question is whether this can be simplified? (It works, and as we saw the HighLevelSynthesis pass works correctly on such circuits).

@alexanderivrii alexanderivrii requested a review from a team as a code owner February 15, 2024 14:53
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@mtreinish mtreinish added this to the 1.1.0 milestone Feb 15, 2024
@mtreinish mtreinish added Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler labels Feb 15, 2024
@coveralls
Copy link

coveralls commented Feb 15, 2024

Pull Request Test Coverage Report for Build 8927040115

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 116 of 122 (95.08%) changed or added relevant lines in 2 files are covered.
  • 19 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.01%) to 89.586%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/optimization/optimize_annotated.py 112 118 94.92%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 1 88.02%
crates/qasm2/src/lex.rs 6 92.62%
crates/qasm2/src/parse.rs 12 97.15%
Totals Coverage Status
Change from base Build 8926762316: 0.01%
Covered Lines: 61935
Relevant Lines: 69135

💛 - Coveralls

@alexanderivrii
Copy link
Contributor Author

Update: this PR extends the OptimizeAnnotated transpiler pass, however the pass does not currently appear in the transpiler flow, largely because we don't (yet) commonly create quantum circuits with annotated operations.

When the new optimization is applicable, it can give significant reduction, however the quantum circuit has to be of a somewhat special form for that to happen (see the second form of the controlled-adder example from the PR description).

I do believe that this is a useful reduction, however this needs to be followed up by (1) constructing various standard library circuits using modifiers (annotated operations), (2) extending the applicability of the pass.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like this as a pass, thanks for it! I left a couple of comments on legibility of the algorithm, and the main thing I might like to see is a couple more tests of cases where conjugate reduction would have a harder time, such as not being able to reduce the whole block or whether there's disjoint sets of qubits.

Comment on lines 1777 to 1783
def op_successors(self, node):
"""Get the list of "op" successors of a node in the dag."""
return [pred for pred in self.successors(node) if isinstance(pred, DAGOpNode)]

def op_predecessors(self, node):
"""Get the list of "op" predecessors of a node in the dag."""
return [pred for pred in self.predecessors(node) if isinstance(pred, DAGOpNode)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the DAGCircuit methods like these return an iterator (such as the generator expression (pred for pred in ...)) instead of a list - it might be better to do that here too, if not only for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a31b9e6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess we ought to have a release note (and ideally test) for the new DAGCircuit methods. We can do that in a follow-up if there's not enough time right now.

@@ -51,6 +63,7 @@ def __init__(
equivalence_library: Optional[EquivalenceLibrary] = None,
basis_gates: Optional[List[str]] = None,
recurse: bool = True,
do_conjugate_reduction: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicking: conjugate_reduction is a little shorter and communicates (imo) the same information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason to have the longer name (I think) was to more clearly differentiate between the option of whether to apply conjugate reduction (currently called do_conjugate_reduction) and the actual method that does this (currently called _conjugate_reduction). The reason is that conjugate reduction is an option is because it's technically a more expensive optimization and we may want to have a way to switch it off in some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not a big deal. The internal private attribute could still be called _do_conjugate_reduction if you were worried about naming collisions while still permitting OptimizeAnnotated(conjugate_reduction=False), but it's fine as-is.

Comment on lines 325 to 358
We return the operation ``op_new`` which a new custom instruction with definition
``IP * A * IR``, where ``A`` is a new annotated-operation with modifiers ``M`` and
base gate ``IQ``.
"""
p_dag, q_dag, r_dag = base_decomposition

p_instr = Instruction(
name="ip", num_qubits=op.base_op.num_qubits, num_clbits=op.base_op.num_clbits, params=[]
)
p_instr.definition = dag_to_circuit(p_dag)

q_instr = Instruction(
name="iq", num_qubits=op.base_op.num_qubits, num_clbits=op.base_op.num_clbits, params=[]
)
q_instr.definition = dag_to_circuit(q_dag)

r_instr = Instruction(
name="ir", num_qubits=op.base_op.num_qubits, num_clbits=op.base_op.num_clbits, params=[]
)
r_instr.definition = dag_to_circuit(r_dag)

op_new = Instruction(
"optimized", num_qubits=op.num_qubits, num_clbits=op.num_clbits, params=[]
)
num_control_qubits = op.num_qubits - op.base_op.num_qubits

circ = QuantumCircuit(op.num_qubits, op.num_clbits)
circ.append(p_instr, range(num_control_qubits, op.num_qubits))
circ.append(
AnnotatedOperation(base_op=q_instr, modifiers=op.modifiers), range(op.num_qubits)
)
circ.append(r_instr, range(num_control_qubits, op.num_qubits))
op_new.definition = circ
return op_new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this nested ip/iq/ir structure to do anything later? If not: perhaps setting the definition of new_op to be the composition ip . Annotated(iq) . ir and removing a level of nesting might be easier and involve fewer magic instruction names. If we do (always) use the structure, then perhaps we could formalise it into a ConjugateReduction(Instruction) special-case class that has explicit fields for the components that we just store directly as DAGs? It might be easier to retrieve them for later inspection, if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion - done in 8e5858d. I also think that the flatter structure is better, but somehow forgot about compose at the time when initially implementing this PR.

Comment on lines 197 to 203
# optimization to only consider qubits involved in nodes at the front or at the back of the
# circuit
unchecked_qubits = set()

# optimization to keep pairs of nodes for which the inverse check was performed and the nodes
# were found to be not inverse to each other
checked_node_pairs = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand which qubits go into unchecked_qubits here - maybe it's just the name of the variable throwing me off.

I think we put all qubits that have a zero-predecessor and/or zero-successor gate (accounting for edges we already popped off) in here, because those are the only ones that could have an inverse pair bracketing the circuit. Is that right?

If so: perhaps a comment explaining that part of the algorithm might be a bit clearer than calling it an "optimisation". Referring to it as an "optimisation" implies that we're comparing to some generally known algorithm, but this is the first time that I'm thinking about any algorithm for this pass, so the comment isn't super clear to me. Something like outside_node_qubits might be a little more descriptive for the variable name as well - I did a double-take seeing to_check = unchecked_qubits.copy(), though I can see the logic.

Fwiw: checked_node_pairs is implementing an optimisation called memoisation. We don't necessarily need to call it that by name, I'm just mentioning its name in case you think it's useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am renaming unchecked_qubits to active_qubits and improving the comments. At each "iteration" of looking for an inverse pair of gates, one from the start and one from the back of the circuit, we want to iterate (only) over the currently active qubits, as typically the number of such qubits is significantly smaller than the total number of qubits.

Memoisation - I have definitely heard the term before, but I can never remember the correct terminology.

Comment on lines 189 to 192
# For each qubit, keep the nodes at the start or at the back of the circuit involving this qubit
# (also allowing None and empty values).
q_to_f = {}
q_to_b = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling these front_node_for_qubit and back_node_for_qubit might be a shade easier to read than the abbreviations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - doing this!

Comment on lines 252 to 277
# in the future we want to include a more precise check whether a pair
# of nodes are inverse
if fn.op == bn.op.inverse():
# update q_to_f, q_to_b maps
for q in fn.qargs:
q_to_f[q] = None
for q in bn.qargs:
q_to_b[q] = None

# see which other nodes become at the front and update information
for node in dag.op_successors(fn):
if node not in processed_nodes:
in_degree[node] -= 1
if in_degree[node] == 0:
for q in node.qargs:
q_to_f[q] = node
unchecked_qubits.add(q)

# see which other nodes become at the back and update information
for node in dag.op_predecessors(bn):
if node not in processed_nodes:
out_degree[node] -= 1
if out_degree[node] == 0:
for q in node.qargs:
q_to_b[q] = node
unchecked_qubits.add(q)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't discard the other qargs in an inverse pair out of to_check, which might cause interesting edge-case interactions if we have a pair A(0, 1) . B(0, 1) . <...> . B(0, 1) . A(0, 1) or something - we might strip off the A(0, 1) pair while examining 0, but then also strip off the B pair while examining 1. I'm not clear that that would actually cause any problems (what if A and B shared some but not all qargs?), but at a minimum it would be good to get some tests of that in.

Fwiw, if we wanted to sidestep it, then having the for q in to_check loop be something like:

while to_check:
    qubit = to_check.pop()
    if fn == bn.inverse():
        for q in fn.qargs:
            to_check.discard(q)

could get around that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this can happen, but should be fine since the nodes get output in the correct order, so in your example we will add A(0, 1) first and B(0, 1) second. In fact, we probably even want this to happen to reduce the total number of main iterations (though probably it does not really matter). I will add more tests.

Comment on lines 230 to 234
if (fn := q_to_f.get(q, None)) is None:
continue
if (bn := q_to_b.get(q, None)) is None:
continue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to have fn here but front_circuit later. Might be good to call it the full front_node (if not only because then I don't have to double-take about fn being the keyword for functions in Rust lol).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming (and yes I am now also getting confusing thinking this is a function)

Comment on lines 303 to 305

for node in back_block:
back_circuit.apply_operation_back(node.op, node.qargs, node.cargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly it's a bit more efficient to do for node in reversed(back_block) than back_block = back_block[::1]; for node in back_block, but that said - we could also just use DAGCircuit.apply_operation_front.

(Fwiw, calling reversed on a Python list to make an iterator will produce an iterator that's just as fast as the regular forwards one.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, switching to DAGCircuit.apply_operation_front

Comment on lines 236 to 240
# THe pass should simplify the gate
qc_optimized_keys = qc_optimized.count_ops().keys()
self.assertIn("optimized", qc_optimized_keys)
self.assertNotIn("annotated", qc_optimized_keys)
self.assertEqual(Operator(qc), Operator(qc_optimized))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to make some assertions that the pass has collected the gates into the right parts of the decomposition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have added multiple tests exploring various collection cases.

@alexanderivrii
Copy link
Contributor Author

Jake, many thanks for the review. I have addressed your comments and added more tests.

jakelishman
jakelishman previously approved these changes May 2, 2024
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really cool optimisation, thanks! I'm feeling ok to merge this and let people start playing with it. Thanks for the fast changes!

@jakelishman jakelishman added this pull request to the merge queue May 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch May 2, 2024
@mtreinish mtreinish added this pull request to the merge queue May 2, 2024
Merged via the queue into Qiskit:main with commit bb1ef16 May 2, 2024
15 checks passed
ElePT pushed a commit to ElePT/qiskit that referenced this pull request May 31, 2024
* initial commit for the conjugate reduction optimization

* do not go into definitions when unnecessary

* release notes

* typo

* minor

* rewriting as list comprehension

* improving release notes

* removing print statement

* changing op_predecessors and op_successor methods to return iterators rather than lists

* and removing explcit iter

* constructing the optimized circuit using compose rather than append

* improving variable names

* adding test

* adding tests exploring which gates get collected

* more renaming

---------

Co-authored-by: Matthew Treinish <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: transpiler Issues and PRs related to Transpiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

optimize circuits with annotated operations using conjugate reduction
5 participants