-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidize the numeric code in the Isometry gate class #12197
Conversation
One or more of the the following people are requested to review this:
|
This commit ports the numeric portion of the Isometry gate class to rust. While this will likely improve the performance slightly this move is more to make isolate this code from blas/lapack in numpy. We're hitting some stability issues on arm64 mac in CI and moving this code to rust should hopefully fix this issue. As this is more for functional reasons no real performance tuning was done on this port, there are likely several opportunities to improve the runtime performance of the code.
50d411a
to
6dca86e
Compare
Pull Request Test Coverage Report for Build 8900761790Details
💛 - Coveralls |
The UCGate class is used almost exclusively by the Isometry class to build up the definition of the isometry circuit. There were also some linear algebra inside the function which could also be the source of the stability issues we were seeing on arm64. This commit ports this function as part of the larger isometry migration.
This commit removes the use of bit string manipulations that were faithfully ported from the original python logic (but left a bad taste in my mouth) into more efficient bitwise operations (which were possible in the original python too).
The use of intermediate Vec<u8> as proxy bitstrings was originally ported nearly exactly from the python implementation. But since everything is working now this commit switches to use bitwise operations where it makes sense as this will be more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As best as I could tell, this is a totally faithful port. I highlighted a couple of places where we could probably improve the calling conventions, but I mostly just left suggestions at that (and a couple of places where I couldn't resist commenting on stuff), since I think really the whole implementation could do with a rather more complete reworking, and it's not a worthwhile use of time to try that as part of this porting PR.
use qiskit_accelerate::{ | ||
convert_2q_block_matrix::convert_2q_block_matrix, dense_layout::dense_layout, | ||
error_map::error_map, euler_one_qubit_decomposer::euler_one_qubit_decomposer, nlayout::nlayout, | ||
optimize_1q_gates::optimize_1q_gates, pauli_exp_val::pauli_expval, results::results, | ||
sabre::sabre, sampled_exp_val::sampled_exp_val, sparse_pauli_op::sparse_pauli_op, | ||
stochastic_swap::stochastic_swap, two_qubit_decompose::two_qubit_decompose, utils::utils, | ||
error_map::error_map, euler_one_qubit_decomposer::euler_one_qubit_decomposer, | ||
isometry::isometry, nlayout::nlayout, optimize_1q_gates::optimize_1q_gates, | ||
pauli_exp_val::pauli_expval, results::results, sabre::sabre, sampled_exp_val::sampled_exp_val, | ||
sparse_pauli_op::sparse_pauli_op, stochastic_swap::stochastic_swap, | ||
two_qubit_decompose::two_qubit_decompose, uc_gate::uc_gate, utils::utils, | ||
vf2_layout::vf2_layout, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of stuff really makes me appreciate black
's approach to lists as "do what'll minimise git diffs" haha.
#[pyfunction] | ||
pub fn ucg_is_identity_up_to_global_phase( | ||
single_qubit_gates: Vec<PyReadonlyArray2<Complex64>>, | ||
epsilon: f64, | ||
) -> bool { | ||
let global_phase: Complex64 = if single_qubit_gates[0].as_array()[[0, 0]].abs() >= epsilon { | ||
single_qubit_gates[0].as_array()[[0, 0]].finv() | ||
} else { | ||
return false; | ||
}; | ||
for raw_gate in single_qubit_gates { | ||
let gate = raw_gate.as_array(); | ||
if !abs_diff_eq!( | ||
gate.mapv(|x| x * global_phase), | ||
aview2(&ONE_QUBIT_IDENTITY), | ||
epsilon = 1e-8 // Default tolerance from numpy for allclose() | ||
) { | ||
return false; | ||
} | ||
} | ||
true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre-existing, but global_phase
is not the global phase until its abs
happens to be 1. This test is actually testing whether the gate is close to a scaled identity. Same with diag_is_identity_up_to_global_phase
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe global_phase
should be renamed global_scale
if it can't be constrained always to have magnitude 1
. And likewise, this function should be renamed. But maybe these can wait for a rewrite.
crates/accelerate/src/isometry.rs
Outdated
let free_qubits = num_qubits as usize - control_labels.len() - 1; | ||
if free_qubits == 0 { | ||
let [e1, e2] = construct_basis_states(&[], &control_labels, target_label); | ||
for i in 0..num_col { | ||
let temp: Vec<_> = gate | ||
.dot(&aview2(&[[m[[e1, i]]], [m[[e2, i]]]])) | ||
.into_iter() | ||
.take(2) | ||
.collect(); | ||
m[[e1, i]] = temp[0]; | ||
m[[e2, i]] = temp[1]; | ||
} | ||
return m.into_pyarray_bound(py).into(); | ||
} | ||
for state_free in std::iter::repeat([0_u8, 1_u8]) | ||
.take(free_qubits) | ||
.multi_cartesian_product() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit inconvenient that the product of "empty iterator" doesn't produce the unit tuple in this form - I think Python's itertools
feels more mathematically natural here.
If we really wanted to avoid the large code duplication we could pull out the block of the if into a
fn apply<T: IntoIterator<Item = [usize]>>(...) {}
(or whatever the right iterator item is) and call it twice, but I don't really think it's worth bothering.
crates/accelerate/src/uc_gate.rs
Outdated
let rz_11 = (-Complex64::new(0., 0.5 * PI2)).exp(); | ||
let rz_00 = Complex64::new(0., 0.5 * PI2).exp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers are actually just
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, oh I didn't even see that I just was in mechanical porting mode. Do you think it's worth making that explicit here, or just leave it for LLVM to optimize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine leaving all the little things like this - larger-scale real performance improvements in this would come from a more complete refactor of the implementation as a whole, and as you say, the compiler can probably constant-fold this anyway.
Co-authored-by: Jake Lishman <[email protected]>
} | ||
|
||
#[inline(always)] | ||
fn l2_norm(vec: &[Complex64]) -> f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that this were somehow called the 2-norm
or p-norm
for p=2
("somehow" with a valid identifier). l2
is a space, usually a sequence space. But upon googling for uses of p norm, l2 norm, etc. it seems that, as far as the internet is concerned, especially machine-learning internet, lp, Lp, can mean whatever you want them to mean. [EDIT. looks like this is translated from np.linalg.norm
, which, along with Julia's LinearAlgebra
, calls this the 2-norm)]
We might want a function for p norms with p as a parameter, with 2 as the default, and make sure constants are propagated, or whatever it takes to get the correct norm at compile time. But, unless we have an existing use for p != 2
that can be postponed indefinitely.
In any case, at some point, we will wish that functions like this had been collected in a repo-wide module. (probably not in the ten thousandth "utils.xxx"). And in that case it's worth having somewhat more general trait bounds. Not repeating this function could be important for correctness and performance. For example, the Julia 2-norm function appears to sometimes scale the elements to avoid overflow. However, the function mynorm(v) = sqrt(sum(x -> x^2, v));
is more than 3 times faster than top-level entry point norm
. We'd want an api and implementation that reflects our common uses, which might be almost all 2- and 4-element vectors. Indeed, an optimization, which we don't need to make at the moment, would take advantage the length of the vector, which is known here at compile time to be two.
The main point for now is to consider at least collecting these somewhere where they are visible so that the questions of performance and correctness can be more easily addressed at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to get into a bikeshedding conversation about naming, but l2_norm
or lp_norm
is a pretty standard terminology for this. If you look at the comment in the Julia code below what you linked they use it there too:
# Compute L_p norm ‖x‖ₚ = sum(abs(x).^p)^(1/p)
That being said the response to the main comment here is that we can look at creating a dedicated mathematical functions module, or even a separate crate if it's generic enough for these kind of things in a follow up PR. Ideally we'd contribute this to something like ndarray or faer imo though assuming a generic implementation. I only did it like this because it was so small and simple and the usage was very minimal. So far the only function I think that falls into this category is the 2x2 determinant function which is used in this PR, the two qubit decomposer, and the one qubit decomposer (where it was originally added). But again it's just a single line so I didn't feel like it was worth creating a dedicate module for it.
Sorry. My mistake. I originally did not include the |
This looks good to me. The only thing I would change is these let rz_11 = (-Complex64::new(0., 0.5 * PI2)).exp();
let rz_00 = Complex64::new(0., 0.5 * PI2).exp(); I know we are leaving a lot of things that could be improved, but this is such an easy and reasonable change, I don't see why not. |
I updated this in: fcf587f it turns out we all missed that this was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks as right to me as I think I'd be able to spot, given the original source. Thanks for doing all this!
(will leave unmerged til tomorrow in case John wants to look too)
On the one hand, oh duh. On the other hand, the fact that we all missed it suggests that |
* Oxidize the numeric code in the Isometry gate class This commit ports the numeric portion of the Isometry gate class to rust. While this will likely improve the performance slightly this move is more to make isolate this code from blas/lapack in numpy. We're hitting some stability issues on arm64 mac in CI and moving this code to rust should hopefully fix this issue. As this is more for functional reasons no real performance tuning was done on this port, there are likely several opportunities to improve the runtime performance of the code. * Remove unused import * Use rust impl for small utility functions too * Oxidize the linalg in UCGate too The UCGate class is used almost exclusively by the Isometry class to build up the definition of the isometry circuit. There were also some linear algebra inside the function which could also be the source of the stability issues we were seeing on arm64. This commit ports this function as part of the larger isometry migration. * Migrate another numeric helper method of UCGate * Remove now unused code paths * Remove bitstring usage with bitwise ops This commit removes the use of bit string manipulations that were faithfully ported from the original python logic (but left a bad taste in my mouth) into more efficient bitwise operations (which were possible in the original python too). * Mostly replace Vec<u8> usage with bitwise operations The use of intermediate Vec<u8> as proxy bitstrings was originally ported nearly exactly from the python implementation. But since everything is working now this commit switches to use bitwise operations where it makes sense as this will be more efficient. * Apply suggestions from code review Co-authored-by: Jake Lishman <[email protected]> * Remove python side call sites * Fix integer typing in uc_gate.rs * Simplify basis state bitshift loop logic * Build set of control labels outside construct_basis_states * Use 2 element array for reverse_qubit_state * Micro optimize reverse_qubit_state * Use 1d numpy arrays for diagonal inputs * Fix lint * Update crates/accelerate/src/isometry.rs Co-authored-by: John Lapeyre <[email protected]> * Add back sqrt() accidentally removed by inline suggestion * Use a constant for rz pi/2 elements --------- Co-authored-by: Jake Lishman <[email protected]> Co-authored-by: John Lapeyre <[email protected]>
Summary
This commit ports the numeric portion of the Isometry gate class to rust. While this will likely improve the performance slightly this move is more to make isolate this code from blas/lapack in numpy. We're hitting some stability issues on arm64 mac in CI and moving this code to rust should hopefully fix this issue. As this is more for functional reasons no real performance tuning was done on this port, there are likely several opportunities to improve the runtime performance of the code.
Details and comments