Use `OnceCell` to cache Python operations #12942

We were previously using `RefCell` to handle the interior mutability for the Python-operation caches. This works fine, but has two problems: - `RefCell<Py<PyAny>>` is two pointers wide - we had to do heavier dynamic runtime checks for the borrow status on every access, including simple gets. This commit moves the caching to instead use `OnceCell`. The internal tracking for a `OnceCell<T>` is just whether an internal `Option<T>` is in the `Some` variant, and since `Option<Py<PyAny>>` will use the null-pointer optimisation for space, this means that `OnceCell<Py<PyAny>>` is only a single pointer wide. Since it's not permissible to take out a mutable reference to the interior, there is also no dynamic runtime checking---just the null-pointer check that the cell is initialised---so access is faster as well. We can still clear the cache if we hold a mutable reference to the `OnceCell`, and since every method that can invalidate the cache also necessarily changes Rust-space, they certainly require mutable references to the cell, making it safe. There is a risk here, though, in `OnceCell::get_or_init`. This function is not re-entrant; one cannot attempt the initialisation while another thread is initialising it. Typically this is not an issue, since the type isn't `Sync`, however PyO3 allows types that are only `Send` and _not_ `Sync` to be put in `pyclass`es, which Python can then share between threads. This is usually safe due to the GIL mediating access to Python-space objects, so there are no data races. However, if the initialiser passed to `get_or_init` yields control at any point to the Python interpreter (such as by calling a method on a Python object), this gives it a chance to block the thread and allow another Python thread to run. The other thread could then attempt to enter the same cache initialiser, which is a violation of its contract. PyO3's `GILOnceCell` can protect against this failure mode, but this is inconvenient to use for us, because it requires the GIL to be held to access the value at all, which is problematic during `Clone` operations. Instead, we make ourselves data-race safe by manually checking for initialisation, calcuting the cache value ourselves, and then calling `set` or `get_or_init` passing the already created object. While the initialiser can still yield to the Python interpreter, the actual setting of the cell is now atomic in this sense, and thus safe. The downside is that more than one thread might do the same initialisation work, but given how comparitively rare the use of Python threads is, it's better to prioritise memory use and single-Python-threaded access times than one-off cache population in multiple Python threads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `OnceCell` to cache Python operations #12942

Use `OnceCell` to cache Python operations #12942

Commits on Aug 14, 2024

Use OnceCell to cache Python operations #12942

Use OnceCell to cache Python operations #12942

Commits on Aug 14, 2024

Use `OnceCell` to cache Python operations #12942

Use `OnceCell` to cache Python operations #12942