Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use singleton matrices for unparametrised standard gates #10296

Merged
merged 5 commits into from
Jul 18, 2023

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented Jun 15, 2023

Summary

This makes the array form of standard gates with zero parameters singleton class attributes that reject modification. The class-level __array__ methods are updated to return exactly the same instance, except in very unusual circumstances, which means that Gate.to_matrix() and numpy.asarray() calls on the objects will return the same instance. This avoids a decent amount of construction time, and avoids several Python-space list allocations and array allocations.

The dtypes of the static arrays are all standardised to be complex128. Gate matrices are in general unitary, Gate.to_matrix() already enforces a cast to complex128. For gates that allowed their dtypes to be inferred, there were several cases where native ints and floats would be used, meaning that Gate.to_matrix() would also involve an extra matrix allocation to hold the cast, which just wasted time.

For standard controlled gates, we store both the closed- and open-controlled matrices singly controlled gates. For gates with more than one control, we only store the "all ones" controlled case, as a memory/speed trade-off; open controls are much less common than closed controls.

For the most part this won't have an effect on peak memory usage, since all the allocated matrices in standard Qiskit usage would be freed by the garbage collector almost immediately. This will, however, reduce construction costs and garbage-collector pressure, since fewer allocations+frees will occur, and no calculations will need to be done.

Details and comments

I ran some bits of the benchmark suite, not the full thing. One particularly notable speedup was in CommutationAnalysis, where I'm seeing ~15% improvements in the benchmarks:

       before           after         ratio
     [81964e64]       [339f4dca]
     <singleton-matrices~1>       <singleton-matrices>
-      1.86±0.02s       1.65±0.01s     0.89  passes.PassBenchmarks.time_commutation_analysis(20, 1024)
-        563±10ms         488±20ms     0.87  passes.PassBenchmarks.time_commutation_analysis(5, 1024)
-       1.33±0.1s       1.12±0.01s     0.84  passes.PassBenchmarks.time_commutation_analysis(14, 1024)

I'd expect to see some improvements bits of unitary synthesis and the 1q and 2q conversions into unitaries too for real workloads, though I didn't immediately see them in my quick runs - the benchmark suite might not use too many of the gates that see the speedup, though, or I might have missed those benchmarks.

This makes the array form of standard gates with zero parameters
singleton class attributes that reject modification. The class-level
`__array__` methods are updated to return exactly the same instance,
except in very unusual circumstances, which means that
`Gate.to_matrix()` and `numpy.asarray()` calls on the objects will
return the same instance. This avoids a decent amount of construction
time, and avoids several Python-space list allocations and array
allocations.

The dtypes of the static arrays are all standardised to by complex128.
Gate matrices are in general unitary, `Gate.to_matrix()` already
enforces a cast to `complex128`.  For gates that allowed their dtypes to
be inferred, there were several cases where native ints and floats would
be used, meaning that `Gate.to_matrix()` would also involve an extra
matrix allocation to hold the cast, which just wasted time.

For standard controlled gates, we store both the closed- and
open-controlled matrices singly controlled gates.  For gates with more
than one control, we only store the "all ones" controlled case, as a
memory/speed trade-off; open controls are much less common than closed
controls.

For the most part this won't have an effect on peak memory usage, since
all the allocated matrices in standard Qiskit usage would be freed by
the garbage collector almost immediately.  This will, however, reduce
construction costs and garbage-collector pressure, since fewer
allocations+frees will occur, and no calculations will need to be done.
@jakelishman jakelishman added performance Changelog: None Do not include in changelog labels Jun 15, 2023
@jakelishman jakelishman requested a review from a team as a code owner June 15, 2023 19:13
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia

@jakelishman
Copy link
Member Author

Oh wait, lol - I wrote the commit message before I actually finished making the code do what I described with regard to the larger controlled gates. I'll push up that change.

@coveralls
Copy link

coveralls commented Jun 15, 2023

Pull Request Test Coverage Report for Build 5581753665

  • 82 of 84 (97.62%) changed or added relevant lines in 13 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.04%) to 86.062%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/circuit/_utils.py 30 32 93.75%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 2 91.65%
qiskit/extensions/quantum_initializer/squ.py 2 80.0%
Totals Coverage Status
Change from base Build 5580403198: 0.04%
Covered Lines: 72297
Relevant Lines: 84006

💛 - Coveralls

@jakelishman jakelishman added this to the 0.25.0 milestone Jun 16, 2023
@jlapeyre
Copy link
Contributor

jlapeyre commented Jun 16, 2023

This is a good change, not just for optimization, but for organization and standardization as well.

We could add some performance and safety by introducing something like this:

_array = np.array([[1, 0, 0, 0], [0, 0, 0, 1], [0, 1, 0, 0], [0, 0, 1, 0]], dtype=np.complex128)

# Don't use this, just for comparison
def _as_array_plain(_array, dtype=None):
    return np.asarray(_array, dtype=dtype)

# Use this if the unitary has non-zero imaginary part
# I chose TypeError even though the type is not part of the standard Python type apparatus.
def _as_array_complex(_array, dtype=None):
    if dtype is None or dtype == 'complex128':
        return _array
    if dtype == 'float64':
        raise TypeError("Can't convert unitary with non-zero imaginary part to float")
    return np.asarray(_array, dtype=dtype)

# Use this if the unitary is real, although stored as complex128
def _as_array_float(_array, dtype=None):
    if dtype is None or dtype == 'complex128':
        return _array
    if dtype == 'float64':
        return _array.real
    return np.asarray(_array, dtype=dtype)

Here are some times:

In [1]: %timeit _as_array_plain(_array)
102 ns ± 0.906 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [2]: %timeit _as_array_float(_array)
42.2 ns ± 0.362 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [3]: %timeit _as_array_plain(_array, dtype='float64')
/home/lapeyre/programming_play/playpython/numpy/array_convert.py:27: ComplexWarning: Casting complex values to real discards the imaginary part
  return np.asarray(_array, dtype=dtype)
1.11 µs ± 9.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit _as_array_float(_array, dtype='float64')
136 ns ± 0.59 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

For example the X gate would call as_array_float and the Y gate would call _as_array_complex

@jakelishman
Copy link
Member Author

That's interesting with the timings. I don't think there's a need to optimise a potential returned float matrix; a prospective caller can easily do call .real, since they'd need to know that the matrix was real for it to make sense anyway, and the logical return value of "unitary matrix" is complex only.

@jlapeyre jlapeyre self-assigned this Jun 16, 2023
@jlapeyre
Copy link
Contributor

I understand that it makes sense for the caller to get the complex matrix and call real themselves. But we are offering them an interface to use dtype. A user has to investigate or otherwise know that real will be more efficient. One essential part of why I suggested this is that we know which are real and which are not. So we can make it more efficient and safer at the same time. The user might make a mistake.

numpy made a design choice to warn and return the real part instead of error when you do asarray(v, dtype='float64') for complex v. I don't think we need to follow that choice here. These are higher level objects that have explicit, static, representations.

@jakelishman
Copy link
Member Author

Personally I think Gate.__array__ should universally be typed as complex; unitary matrices are complex in general, and a user should be aware that they'll need to handle the conversion. I'm not sure it's worth maintaining additional code in Qiskit to accommodate the very edge case that somebody might want a gate matrix in a non-standard dtype, when it's a basic conversion that a user can easily do themselves.

Fwiw, evaluating dtype == "float64" will miss things. The correct comparison would be dtype == np.dtype("float64"), which is a call that takes longer than the existing np.asarray call.

The call that matters for timing within Qiskit is Gate.to_matrix, which does a general hasattr and then explicitly must pass dtype=np.complex128 to handle user-defined custom gates whose __array__ methods might naturally return non-complex types. That means we'll always need to be concerned about the overhead of dtype checking, which wipes out the performance benefit of _as_array_float.

@jlapeyre
Copy link
Contributor

dtype == np.dtype("float64"), which is a call that takes longer than the existing np.asarray call.

I must be missing something. I see this

In [2]: v = np.array([1.,2.,3.])

In [3]: %timeit v.dtype == "float64"
94.6 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [4]: %timeit v.dtype == np.dtype("float64")
205 ns ± 0.895 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: cmptype = np.dtype("float64")

In [6]: %timeit v.dtype == cmptype
37.3 ns ± 0.205 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [7]: dtype2 = np.dtype("float64")

In [8]: %timeit dtype2 == cmptype
23.5 ns ± 0.175 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit dtype2 == "float64"
76.6 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [10]: %timeit np.dtype("float64") == "float64"
227 ns ± 0.684 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

and

In [37]: xg = XGate()

In [38]: %timeit np.asarray(xg)
1.73 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [39]: %timeit xg.to_matrix()
988 ns ± 6.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@jlapeyre
Copy link
Contributor

jlapeyre commented Jul 17, 2023

Here are three sort of separate points

  1. My preference would be that there be a standard way to convert a Gate to a numerical matrix that returns np.complex128 and does not support anything else. For example does not take an argument dtype, or raises an error if one is passed. That gives us what we need and doesn't supply extra API to support. (There is always some overlooked workflow in the wild, that hopefully is brought to our attention.)

  2. It's clear that returning a precomputed array would be a great performance boost. EDIT: I missed _ARRAY.setflags(write=False), which answers the following concern. One potential problem is that a user may modify the data. I would expect that some user has done this even though its a bad idea just because it's easy. We'll of course say that modifying the array is not supported. But it would remain a source of bugs. I expect not too common. But, I've seen this bug in other contexts: ie an array whose data is meant to be immutable, but some caller eventually modifies it. I support just trying it anyway.

  3. There is probably a way to factor the code in this PR. To me it seems like the obvious thing to do. But I suppose one could argue that it's preferable to introducing any structure. I think a small amount of structure is worth the trouble (I'm thinking more of a function or two rather than classes). I imagine that the implementation may be changed. Eg, how many control states do we support and how do we do it. This would be easier if it were done in one place.

wipes out the performance benefit of _as_array_float

I think we may be able to avoid calling that on user-defined gates. But in any case, I don't think supporting this for efficiency is important.

@jlapeyre
Copy link
Contributor

I'm thinking of something like this:

import numpy
from qiskit.circuit.library import XGate, YGate, CXGate, CZGate, CCXGate
from qiskit.circuit.gate import Gate

def setmatrix(matrix_def):
    matrix = numpy.array(matrix_def, dtype=numpy.complex128)
    matrix.setflags(write=False)
    return matrix

def _to_matrix(gate):
    if hasattr(gate, "_ARRAY"):
        return gate._ARRAY
    if hasattr(gate, "_ARRAY_0"):
        return gate._ARRAY_1 if gate.ctrl_state == 1 else gate._ARRAY_0
    if hasattr(gate, "_ARRAY_3") and gate.ctrl_state == 3:
        return gate._ARRAY_3
    if hasattr(gate, "__array__"):
        return gate.__array__(dtype=complex)
    raise CircuitError(f"to_matrix not defined for this {type(gate)}")

Gate.to_matrix = _to_matrix

Used like this

XGate._ARRAY = setmatrix([[1, 0], [0, 1]])

CXGate._ARRAY_1 = setmatrix([[1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 1, 0, 0]])
CXGate._ARRAY_0 = setmatrix([[0, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]])

array3 = numpy.eye(8)
array3[[3, 7], :] = array3[[7, 3], :]
CCXGate._ARRAY_3 = setmatrix(array3)

@jakelishman
Copy link
Member Author

jakelishman commented Jul 17, 2023

  1. We have that: it's Gate.to_matrix. It's beyond the scope of this PR to change how that works by calling Gate.__array__, because that would change subclass interactions.
  2. We're agreed here.
  3. I can add a little bit of metaprogrammatic generation of the __array__ method and class variable if you think it'll improve legibility a lot - I'd do it as two decorators _add_array_method(base_array: np.ndarray) or _add_controlled_array_method(base_gate_array: np.ndarray, precomputed_control_states: Iterable[int]).

On the dtype comparisons, I was mistaken because I thought np.float64 was an instance of np.dtype, but it isn't. I was seeing that np.float64 != "float64", but that's not actually a dtype comparison like I thought it was.

Instead of defining the array functions manually for each class, this
adds a small amount of metaprogramming that adds them in with the
correct `ndarray` properties set, including for controlled gates.
@jakelishman
Copy link
Member Author

Decorators added in 02f5623.

@jlapeyre
Copy link
Contributor

jlapeyre commented Jul 18, 2023

I don't know if the decorators improve legibility. But they do what I was mostly concerned with, which is to factor out repeated code. It's certainly more concise now.
It looks like you want to make the decorators available to users, right?

@jakelishman
Copy link
Member Author

Not to users, no, I put them in a private module. We can always export them later if we think they're API surface we'd want to support.

@jlapeyre
Copy link
Contributor

Not to users, no, I put them in a private module

Good. It would be nice to see how this works for a while without running the risk of the labor involved in deprecation.

Copy link
Contributor

@jlapeyre jlapeyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@jlapeyre jlapeyre added this pull request to the merge queue Jul 18, 2023
Merged via the queue into Qiskit:main with commit d62f780 Jul 18, 2023
13 checks passed
@jakelishman jakelishman deleted the singleton-matrices branch July 19, 2023 21:19
jakelishman added a commit to jakelishman/qiskit-terra that referenced this pull request Sep 21, 2023
This was left in only as a mistake during the writing of Qiskit#10296;
originally the `with_gate_array` decorator didn't exist and all classes
had manual `_ARRAY` specifiers, but `SXGate`'s got left in accidentally
when all the rest were removed in favour of the decorator.
github-merge-queue bot pushed a commit that referenced this pull request Sep 22, 2023
* Remove spurious `SXGate._ARRAY`

This was left in only as a mistake during the writing of #10296;
originally the `with_gate_array` decorator didn't exist and all classes
had manual `_ARRAY` specifiers, but `SXGate`'s got left in accidentally
when all the rest were removed in favour of the decorator.

* Remove unused import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants