Use `OnceCell` to cache Python operations #12942

jakelishman · 2024-08-12T08:44:52Z

Summary

We were previously using RefCell to handle the interior mutability for the Python-operation caches. This works fine, but has two problems:

RefCell<Py<PyAny>> is two pointers wide
we had to do heavier dynamic runtime checks for the borrow status on every access, including simple gets.

This commit moves the caching to instead use OnceCell. The internal tracking for a OnceCell<T> is just whether an internal Option<T> is in the Some variant, and since Option<Py<PyAny>> will use the null-pointer optimisation for space, this means that OnceCell<Py<PyAny>> is only a single pointer wide. Since it's not permissible to take out a mutable reference to the interior, there is also no dynamic runtime checking---just the null-pointer check that the cell is initialised---so access is faster as well.

We can still clear the cache if we hold a mutable reference to the OnceCell, and since every method that can invalidate the cache also necessarily changes Rust-space, they certainly require mutable references to the cell, making it safe.

There is a risk here, though, in OnceCell::get_or_init. This function is not re-entrant; one cannot attempt the initialisation while another thread is initialising it. Typically this is not an issue, since the type isn't Sync, however PyO3 allows types that are only Send and not Sync to be put in pyclasses, which Python can then share between threads. This is usually safe due to the GIL mediating access to Python-space objects, so there are no data races. However, if the initialiser passed to get_or_init yields control at any point to the Python interpreter (such as by calling a method on a Python object), this gives it a chance to block the thread and allow another Python thread to run. The other thread could then attempt to enter the same cache initialiser, which is a violation of its contract.

PyO3's GILOnceCell can protect against this failure mode, but this is inconvenient to use for us, because it requires the GIL to be held to access the value at all, which is problematic during Clone operations. Instead, we make ourselves data-race safe by manually checking for initialisation, calcuting the cache value ourselves, and then calling set or get_or_init passing the already created object. While the initialiser can still yield to the Python interpreter, the actual setting of the cell is now atomic in this sense, and thus safe. The downside is that more than one thread might do the same initialisation work, but given how comparitively rare the use of Python threads is, it's better to prioritise memory use and single-Python-threaded access times than one-off cache population in multiple Python threads.

Details and comments

GILOnceCell would likely be better here, but it doesn't derive Clone, nor would that make sense, since it uses the GIL to mediate even getter access. When we're able to move to PyO3 0.22 and disable the py-clone feature of it, we might be able to reorganise this a little bit to avoid needing to handle the GIL-atomic initialisation manually.

I measured this as saving 165MB (around 10%) in resident-set size from this benchmark:

from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1_000)
for _ in range(3_000):
    for q in qc.qubits:
        qc.rz(0.0, q)
    for q in qc.qubits:
        qc.rx(0.0, q)
    for q in qc.qubits:
        qc.rz(0.0, q)
    for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
        qc.cx(a, b)

over its parent commit. That's the same microbenchmark we used in #12730. That should have a (small) impact on copy and circuit-build performance due to needing to move around less memory, and access times on the Python operation cache should be smaller too, though that shouldn't be in the critical path.

The biggest memory problem remaining in the CircuitData model (by far) is the parameter storage at the moment - in this particular benchmark, we spend something around 559MB storing the parameters of the 9,000,000 parametric gates, of which 15% is malloc overhead (on macOS, at least, but probably most Unix-likes and not dissimilar on Windows) and within the actual allocation, about 70% is just padding bytes.

qiskit-bot · 2024-08-12T08:44:58Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core
@kevinhartman
@mtreinish

coveralls · 2024-08-12T09:10:45Z

Pull Request Test Coverage Report for Build 10392333478

Details

38 of 40 (95.0%) changed or added relevant lines in 4 files are covered.
7 unchanged lines in 3 files lost coverage.
Overall coverage increased (+0.01%) to 89.707%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/circuit_data.rs	8	9	88.89%
crates/circuit/src/circuit_instruction.rs	10	11	90.91%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	94.02%
crates/accelerate/src/two_qubit_decompose.rs	1	90.69%
crates/qasm2/src/lex.rs	5	92.73%

Totals
Change from base Build 10391255051:	0.01%
Covered Lines:	67532
Relevant Lines:	75281

💛 - Coveralls

raynelfss

This is sensible and, putting aside all the memory saving benefits, it makes the code more readable as we don't need to constantly call borrow or borrow_mut. Hopefully we can bump to PyO3 0.22 and reorganize things soon.

raynelfss · 2024-08-13T16:41:04Z

I had previously approved this PR but it seems to be failing CI now. Any clue as to why @jakelishman ?

CI Failures

We were previously using `RefCell` to handle the interior mutability for the Python-operation caches. This works fine, but has two problems: - `RefCell<Py<PyAny>>` is two pointers wide - we had to do heavier dynamic runtime checks for the borrow status on every access, including simple gets. This commit moves the caching to instead use `OnceCell`. The internal tracking for a `OnceCell<T>` is just whether an internal `Option<T>` is in the `Some` variant, and since `Option<Py<PyAny>>` will use the null-pointer optimisation for space, this means that `OnceCell<Py<PyAny>>` is only a single pointer wide. Since it's not permissible to take out a mutable reference to the interior, there is also no dynamic runtime checking---just the null-pointer check that the cell is initialised---so access is faster as well. We can still clear the cache if we hold a mutable reference to the `OnceCell`, and since every method that can invalidate the cache also necessarily changes Rust-space, they certainly require mutable references to the cell, making it safe. There is a risk here, though, in `OnceCell::get_or_init`. This function is not re-entrant; one cannot attempt the initialisation while another thread is initialising it. Typically this is not an issue, since the type isn't `Sync`, however PyO3 allows types that are only `Send` and _not_ `Sync` to be put in `pyclass`es, which Python can then share between threads. This is usually safe due to the GIL mediating access to Python-space objects, so there are no data races. However, if the initialiser passed to `get_or_init` yields control at any point to the Python interpreter (such as by calling a method on a Python object), this gives it a chance to block the thread and allow another Python thread to run. The other thread could then attempt to enter the same cache initialiser, which is a violation of its contract. PyO3's `GILOnceCell` can protect against this failure mode, but this is inconvenient to use for us, because it requires the GIL to be held to access the value at all, which is problematic during `Clone` operations. Instead, we make ourselves data-race safe by manually checking for initialisation, calcuting the cache value ourselves, and then calling `set` or `get_or_init` passing the already created object. While the initialiser can still yield to the Python interpreter, the actual setting of the cell is now atomic in this sense, and thus safe. The downside is that more than one thread might do the same initialisation work, but given how comparitively rare the use of Python threads is, it's better to prioritise memory use and single-Python-threaded access times than one-off cache population in multiple Python threads.

jakelishman · 2024-08-14T00:09:45Z

Ray: it's ok, it's just that #12943 merged in the meantime that had a logical but not syntactic conflict. I've rebased this PR to account for the additional cache interaction added in that PR.

raynelfss

LGTM, thank you for the quick fixes!

jakelishman added performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Aug 12, 2024

jakelishman added this to the 1.3 beta milestone Aug 12, 2024

jakelishman requested a review from a team as a code owner August 12, 2024 08:44

jakelishman mentioned this pull request Aug 12, 2024

Allow immutable borrow to access QuantumCircuit.parameters #12918

Merged

raynelfss previously approved these changes Aug 13, 2024

View reviewed changes

raynelfss added this pull request to the merge queue Aug 13, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 13, 2024

jakelishman force-pushed the oncecell-cache branch from 3c765fd to 53d225f Compare August 14, 2024 00:05

jakelishman force-pushed the oncecell-cache branch from 53d225f to a19d92f Compare August 14, 2024 00:08

raynelfss approved these changes Aug 14, 2024

View reviewed changes

Merge branch 'main' into oncecell-cache

fc315af

jakelishman added this pull request to the merge queue Aug 14, 2024

Merged via the queue into Qiskit:main with commit 48cca36 Aug 14, 2024
15 checks passed

jakelishman deleted the oncecell-cache branch August 14, 2024 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `OnceCell` to cache Python operations #12942

Use `OnceCell` to cache Python operations #12942

jakelishman commented Aug 12, 2024

qiskit-bot commented Aug 12, 2024

coveralls commented Aug 12, 2024 •

edited

Loading

raynelfss left a comment

raynelfss commented Aug 13, 2024

jakelishman commented Aug 14, 2024

raynelfss left a comment

Use OnceCell to cache Python operations #12942

Use OnceCell to cache Python operations #12942

Conversation

jakelishman commented Aug 12, 2024

Summary

Details and comments

qiskit-bot commented Aug 12, 2024

coveralls commented Aug 12, 2024 • edited Loading

Pull Request Test Coverage Report for Build 10392333478

Details

💛 - Coveralls

raynelfss left a comment

Choose a reason for hiding this comment

raynelfss commented Aug 13, 2024

jakelishman commented Aug 14, 2024

raynelfss left a comment

Choose a reason for hiding this comment

Use `OnceCell` to cache Python operations #12942

Use `OnceCell` to cache Python operations #12942

coveralls commented Aug 12, 2024 •

edited

Loading