Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxidize the numeric code in the Isometry gate class #12197

Merged
merged 25 commits into from
Apr 30, 2024

Conversation

mtreinish
Copy link
Member

Summary

This commit ports the numeric portion of the Isometry gate class to rust. While this will likely improve the performance slightly this move is more to make isolate this code from blas/lapack in numpy. We're hitting some stability issues on arm64 mac in CI and moving this code to rust should hopefully fix this issue. As this is more for functional reasons no real performance tuning was done on this port, there are likely several opportunities to improve the runtime performance of the code.

Details and comments

@mtreinish mtreinish added performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository labels Apr 17, 2024
@mtreinish mtreinish added this to the 1.1.0 milestone Apr 17, 2024
@mtreinish mtreinish requested a review from a team as a code owner April 17, 2024 19:09
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Eric-Arellano
  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia
  • @kevinhartman
  • @levbishop
  • @mtreinish

This commit ports the numeric portion of the Isometry gate class to
rust. While this will likely improve the performance slightly this move
is more to make isolate this code from blas/lapack in numpy. We're
hitting some stability issues on arm64 mac in CI and moving this code to
rust should hopefully fix this issue. As this is more for functional
reasons no real performance tuning was done on this port, there are
likely several opportunities to improve the runtime performance of the
code.
@coveralls
Copy link

coveralls commented Apr 17, 2024

Pull Request Test Coverage Report for Build 8900761790

Details

  • 427 of 428 (99.77%) changed or added relevant lines in 6 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.07%) to 89.525%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/isometry.rs 288 289 99.65%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.03%
crates/qasm2/src/lex.rs 3 92.37%
Totals Coverage Status
Change from base Build 8900042661: 0.07%
Covered Lines: 61378
Relevant Lines: 68560

💛 - Coveralls

The UCGate class is used almost exclusively by the Isometry class to
build up the definition of the isometry circuit. There were also some
linear algebra inside the function which could also be the source of the
stability issues we were seeing on arm64. This commit ports this
function as part of the larger isometry migration.
@jakelishman jakelishman self-assigned this Apr 18, 2024
This commit removes the use of bit string manipulations that were
faithfully ported from the original python logic (but left a bad taste
in my mouth) into more efficient bitwise operations (which were
possible in the original python too).
The use of intermediate Vec<u8> as proxy bitstrings was originally
ported nearly exactly from the python implementation. But since
everything is working now this commit switches to use bitwise operations
where it makes sense as this will be more efficient.
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As best as I could tell, this is a totally faithful port. I highlighted a couple of places where we could probably improve the calling conventions, but I mostly just left suggestions at that (and a couple of places where I couldn't resist commenting on stuff), since I think really the whole implementation could do with a rather more complete reworking, and it's not a worthwhile use of time to try that as part of this porting PR.

Comment on lines 16 to 24
use qiskit_accelerate::{
convert_2q_block_matrix::convert_2q_block_matrix, dense_layout::dense_layout,
error_map::error_map, euler_one_qubit_decomposer::euler_one_qubit_decomposer, nlayout::nlayout,
optimize_1q_gates::optimize_1q_gates, pauli_exp_val::pauli_expval, results::results,
sabre::sabre, sampled_exp_val::sampled_exp_val, sparse_pauli_op::sparse_pauli_op,
stochastic_swap::stochastic_swap, two_qubit_decompose::two_qubit_decompose, utils::utils,
error_map::error_map, euler_one_qubit_decomposer::euler_one_qubit_decomposer,
isometry::isometry, nlayout::nlayout, optimize_1q_gates::optimize_1q_gates,
pauli_exp_val::pauli_expval, results::results, sabre::sabre, sampled_exp_val::sampled_exp_val,
sparse_pauli_op::sparse_pauli_op, stochastic_swap::stochastic_swap,
two_qubit_decompose::two_qubit_decompose, uc_gate::uc_gate, utils::utils,
vf2_layout::vf2_layout,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of stuff really makes me appreciate black's approach to lists as "do what'll minimise git diffs" haha.

Comment on lines +269 to +290
#[pyfunction]
pub fn ucg_is_identity_up_to_global_phase(
single_qubit_gates: Vec<PyReadonlyArray2<Complex64>>,
epsilon: f64,
) -> bool {
let global_phase: Complex64 = if single_qubit_gates[0].as_array()[[0, 0]].abs() >= epsilon {
single_qubit_gates[0].as_array()[[0, 0]].finv()
} else {
return false;
};
for raw_gate in single_qubit_gates {
let gate = raw_gate.as_array();
if !abs_diff_eq!(
gate.mapv(|x| x * global_phase),
aview2(&ONE_QUBIT_IDENTITY),
epsilon = 1e-8 // Default tolerance from numpy for allclose()
) {
return false;
}
}
true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing, but global_phase is not the global phase until its abs happens to be 1. This test is actually testing whether the gate is close to a scaled identity. Same with diag_is_identity_up_to_global_phase below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe global_phase should be renamed global_scale if it can't be constrained always to have magnitude 1. And likewise, this function should be renamed. But maybe these can wait for a rewrite.

crates/accelerate/src/isometry.rs Outdated Show resolved Hide resolved
crates/accelerate/src/isometry.rs Show resolved Hide resolved
crates/accelerate/src/isometry.rs Outdated Show resolved Hide resolved
crates/accelerate/src/isometry.rs Outdated Show resolved Hide resolved
Comment on lines 237 to 253
let free_qubits = num_qubits as usize - control_labels.len() - 1;
if free_qubits == 0 {
let [e1, e2] = construct_basis_states(&[], &control_labels, target_label);
for i in 0..num_col {
let temp: Vec<_> = gate
.dot(&aview2(&[[m[[e1, i]]], [m[[e2, i]]]]))
.into_iter()
.take(2)
.collect();
m[[e1, i]] = temp[0];
m[[e2, i]] = temp[1];
}
return m.into_pyarray_bound(py).into();
}
for state_free in std::iter::repeat([0_u8, 1_u8])
.take(free_qubits)
.multi_cartesian_product()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit inconvenient that the product of "empty iterator" doesn't produce the unit tuple in this form - I think Python's itertools feels more mathematically natural here.

If we really wanted to avoid the large code duplication we could pull out the block of the if into a

fn apply<T: IntoIterator<Item = [usize]>>(...) {}

(or whatever the right iterator item is) and call it twice, but I don't really think it's worth bothering.

qiskit/circuit/library/generalized_gates/isometry.py Outdated Show resolved Hide resolved
crates/accelerate/src/uc_gate.rs Outdated Show resolved Hide resolved
Comment on lines 117 to 118
let rz_11 = (-Complex64::new(0., 0.5 * PI2)).exp();
let rz_00 = Complex64::new(0., 0.5 * PI2).exp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers are actually just $-i$ and $i$, which makes it all the funnier to me that the Python-space version defined a full "get the RZ matrix" method lol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, oh I didn't even see that I just was in mechanical porting mode. Do you think it's worth making that explicit here, or just leave it for LLVM to optimize?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving all the little things like this - larger-scale real performance improvements in this would come from a more complete refactor of the implementation as a whole, and as you say, the compiler can probably constant-fold this anyway.

}

#[inline(always)]
fn l2_norm(vec: &[Complex64]) -> f64 {
Copy link
Contributor

@jlapeyre jlapeyre Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that this were somehow called the 2-norm or p-norm for p=2 ("somehow" with a valid identifier). l2 is a space, usually a sequence space. But upon googling for uses of p norm, l2 norm, etc. it seems that, as far as the internet is concerned, especially machine-learning internet, lp, Lp, can mean whatever you want them to mean. [EDIT. looks like this is translated from np.linalg.norm, which, along with Julia's LinearAlgebra, calls this the 2-norm)]

We might want a function for p norms with p as a parameter, with 2 as the default, and make sure constants are propagated, or whatever it takes to get the correct norm at compile time. But, unless we have an existing use for p != 2 that can be postponed indefinitely.

In any case, at some point, we will wish that functions like this had been collected in a repo-wide module. (probably not in the ten thousandth "utils.xxx"). And in that case it's worth having somewhat more general trait bounds. Not repeating this function could be important for correctness and performance. For example, the Julia 2-norm function appears to sometimes scale the elements to avoid overflow. However, the function mynorm(v) = sqrt(sum(x -> x^2, v)); is more than 3 times faster than top-level entry point norm. We'd want an api and implementation that reflects our common uses, which might be almost all 2- and 4-element vectors. Indeed, an optimization, which we don't need to make at the moment, would take advantage the length of the vector, which is known here at compile time to be two.

The main point for now is to consider at least collecting these somewhere where they are visible so that the questions of performance and correctness can be more easily addressed at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to get into a bikeshedding conversation about naming, but l2_norm or lp_norm is a pretty standard terminology for this. If you look at the comment in the Julia code below what you linked they use it there too:

https://github.com/JuliaLang/julia/blob/68da780b93518204d874410307791702d5200e29/stdlib/LinearAlgebra/src/generic.jl#L496

# Compute L_p norm ‖x‖ₚ = sum(abs(x).^p)^(1/p)

That being said the response to the main comment here is that we can look at creating a dedicated mathematical functions module, or even a separate crate if it's generic enough for these kind of things in a follow up PR. Ideally we'd contribute this to something like ndarray or faer imo though assuming a generic implementation. I only did it like this because it was so small and simple and the usage was very minimal. So far the only function I think that falls into this category is the 2x2 determinant function which is used in this PR, the two qubit decomposer, and the one qubit decomposer (where it was originally added). But again it's just a single line so I didn't feel like it was worth creating a dedicate module for it.

@jlapeyre
Copy link
Contributor

Add back sqrt() accidentally removed by inline suggestion

Sorry. My mistake. I originally did not include the sqrt() in the edit. But then I went back and "corrected" it.

@jlapeyre
Copy link
Contributor

jlapeyre commented Apr 29, 2024

This looks good to me. The only thing I would change is these

                let rz_11 = (-Complex64::new(0., 0.5 * PI2)).exp();
                let rz_00 = Complex64::new(0., 0.5 * PI2).exp();

I know we are leaving a lot of things that could be improved, but this is such an easy and reasonable change, I don't see why not.

@mtreinish
Copy link
Member Author

This looks good to me. The only thing I would change is these

                let rz_11 = (-Complex64::new(0., 0.5 * PI2)).exp();
                let rz_00 = Complex64::new(0., 0.5 * PI2).exp();

I know we are leaving a lot of things that could be improved, but this is such an easy and reasonable change, I don't see why not.

I updated this in: fcf587f it turns out we all missed that this was 0.5 * pi / 2 not 0.5 * pi so it's not just -i and i. But I moved it to a constant for what the actual value is.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks as right to me as I think I'd be able to spot, given the original source. Thanks for doing all this!

(will leave unmerged til tomorrow in case John wants to look too)

@jlapeyre
Copy link
Contributor

it turns out we all missed that this was 0.5 * pi / 2 not 0.5 * pi

On the one hand, oh duh. On the other hand, the fact that we all missed it suggests that PI2 might not be a great name.
For example FRAC_1_SQRT_2 might look verbose, but it's fairly easy to understand.

@jlapeyre jlapeyre added this pull request to the merge queue Apr 30, 2024
Merged via the queue into Qiskit:main with commit febc16c Apr 30, 2024
13 checks passed
@mtreinish mtreinish deleted the isometry-rust branch April 30, 2024 22:19
ElePT pushed a commit to ElePT/qiskit that referenced this pull request May 31, 2024
* Oxidize the numeric code in the Isometry gate class

This commit ports the numeric portion of the Isometry gate class to
rust. While this will likely improve the performance slightly this move
is more to make isolate this code from blas/lapack in numpy. We're
hitting some stability issues on arm64 mac in CI and moving this code to
rust should hopefully fix this issue. As this is more for functional
reasons no real performance tuning was done on this port, there are
likely several opportunities to improve the runtime performance of the
code.

* Remove unused import

* Use rust impl for small utility functions too

* Oxidize the linalg in UCGate too

The UCGate class is used almost exclusively by the Isometry class to
build up the definition of the isometry circuit. There were also some
linear algebra inside the function which could also be the source of the
stability issues we were seeing on arm64. This commit ports this
function as part of the larger isometry migration.

* Migrate another numeric helper method of UCGate

* Remove now unused code paths

* Remove bitstring usage with bitwise ops

This commit removes the use of bit string manipulations that were
faithfully ported from the original python logic (but left a bad taste
in my mouth) into more efficient bitwise operations (which were
possible in the original python too).

* Mostly replace Vec<u8> usage with bitwise operations

The use of intermediate Vec<u8> as proxy bitstrings was originally
ported nearly exactly from the python implementation. But since
everything is working now this commit switches to use bitwise operations
where it makes sense as this will be more efficient.

* Apply suggestions from code review

Co-authored-by: Jake Lishman <[email protected]>

* Remove python side call sites

* Fix integer typing in uc_gate.rs

* Simplify basis state bitshift loop logic

* Build set of control labels outside construct_basis_states

* Use 2 element array for reverse_qubit_state

* Micro optimize reverse_qubit_state

* Use 1d numpy arrays for diagonal inputs

* Fix lint

* Update crates/accelerate/src/isometry.rs

Co-authored-by: John Lapeyre <[email protected]>

* Add back sqrt() accidentally removed by inline suggestion

* Use a constant for rz pi/2 elements

---------

Co-authored-by: Jake Lishman <[email protected]>
Co-authored-by: John Lapeyre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants