-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved fast-path detection in HLS #13070
Conversation
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 10813390132Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sasha.
I think we're ok to add an extra examination of the DAG, especially if we expect that a decent number of calls to HighLevelSynthesis
will use the fast path, and so only do one iteration. If we don't take the fast path, we already examine the DAG loads of times throughout the transpiler, and I'd expect the extra iteration to be dwarfed by the costs of the synthesis / substitution.
all_skipped = True | ||
for node in dag.topological_op_nodes(): | ||
qubits = tuple(dag.find_bit(q).index for q in node.qargs) | ||
if not ( | ||
dag.has_calibration_for(node) | ||
or len(node.qargs) < self._min_qubits | ||
or node.is_directive() | ||
or self._definitely_skip_node(node, qubits) | ||
): | ||
all_skipped = False | ||
break | ||
if all_skipped: | ||
return dag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to do this, but in case you're interested, Python has a very uncommon control-flow construction that handles this precise case without an auxiliary variable:
for node in dag.topological_op_nodes():
qubits = ...
if not self._should_skip(dag, node, qubits):
break
else:
return dag
(The else
is attached to the for
loop.) Totally up to you if you want to use it - there's plenty of reason not to, since most people don't know what the for/else
pattern in Python means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, I like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0962be4.
if not ( | ||
dag.has_calibration_for(node) | ||
or len(node.qargs) < self._min_qubits | ||
or node.is_directive() | ||
or self._definitely_skip_node(node, qubits) | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of checks is the exact same set that we do during the loop below - should they be folded into _definitely_skip_node
so it's shorter to read and everything stays in sync?
Looking quickly, it seems like the internal logic of HLS has changed slightly since I wrote that fast path within the loop, so the logic around having a separate _definitely_skip_node
might be a bit different to when I first wrote it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done in 0962be4.
@@ -280,6 +280,21 @@ def run(self, dag: DAGCircuit) -> DAGCircuit: | |||
return self._run(dag, tracker) | |||
|
|||
def _run(self, dag: DAGCircuit, tracker: QubitTracker) -> DAGCircuit: | |||
# Check if HighLevelSynthesis can be skipped. | |||
all_skipped = True | |||
for node in dag.topological_op_nodes(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We technically don't need to waste time getting a topological order for this function, we can just look at every op node in arbitrary order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done in 0962be4.
# At the moment we don't consider fast-path handling for controlled gates over 3 or | ||
# more controlled qubits. However, we should not abort early for CX/CZ/etc. | ||
and not (node.is_controlled_gate() and node.num_qubits >= 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe this comment has too many negatives in it? I think we do abort early for CX/CZ, right?
Or maybe the comment is referring to "abort _definitely_skip
" and I'm reading it as "abort synthesis for this gate", in which case we might want to tweak the words, since it wasn't clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworded the comment. I have indeed meant that the function _definitely_skip
should not immediately return False
when seeing a CX-gate or a CZ-gate, even though these are controlled gates.
all_skipped = True | ||
for node in dag.topological_op_nodes(): | ||
qubits = tuple(dag.find_bit(q).index for q in node.qargs) | ||
if not ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to cache this result so we can just look it up in the second iteration over the DAG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean hashing whether a particular node can be skipped? I briefly tried this, but in the main use-case, where only a single pass is required, this extra hashing makes the runtime a bit slower.
As you said you're already looking at porting parts of the HLS into Rust, we could maybe rewrite this as:
The advantage here is that we can stay in Rust space for 1+2 and only have to move back into Python for step 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the improvement!
As discussed offline, let's include the suggested restructure from above in the Rust improvement to follow 🙂
Summary
This tiny PR improves the performance of the
HighLevelSynthesis
transpiler pass in the case that it does not need to do anything, for instance when every gate in the circuit is either already supported or can be handled by basis translator. And this is actually quite a common use-case in practice. In fact, this "fast-handling-when-there-is-nothing-to-do" was already supported at some point in the past, but got broken due to some recent changes.The first change is that when deciding whether synthesizing a gate can be skipped we should not abort when we see a CX or a CZ gate, even though these are controlled gates.
The second change is that we should not spend time calculating sets of clean/dirty ancillas unless there is something to do.
On the utility-scale circuits, the runtime is improved from
0.2 s
to0.02 s
.Unfortunately, this causes HLS to examine the circuit 3 times: the first time it checks if it may need to do anything at all, the second time it synthesizing objects in a DAG while keeping track of clean/dirty ancillas, and the third time it builds a new DAG performing the substitutions.
Any additional suggestions for improvements are welcome.
P.S. Not sure if this deserves a release note.