We were previously using `RefCell` to handle the interior mutability for
the Python-operation caches. This works fine, but has two problems:
- `RefCell<Py<PyAny>>` is two pointers wide
- we had to do heavier dynamic runtime checks for the borrow status on
every access, including simple gets.
This commit moves the caching to instead use `OnceCell`. The internal
tracking for a `OnceCell<T>` is just whether an internal `Option<T>` is
in the `Some` variant, and since `Option<Py<PyAny>>` will use the
null-pointer optimisation for space, this means that
`OnceCell<Py<PyAny>>` is only a single pointer wide. Since it's not
permissible to take out a mutable reference to the interior, there is
also no dynamic runtime checking---just the null-pointer check that the
cell is initialised---so access is faster as well.
We can still clear the cache if we hold a mutable reference to the
`OnceCell`, and since every method that can invalidate the cache also
necessarily changes Rust-space, they certainly require mutable
references to the cell, making it safe.
There is a risk here, though, in `OnceCell::get_or_init`. This function
is not re-entrant; one cannot attempt the initialisation while another
thread is initialising it. Typically this is not an issue, since the
type isn't `Sync`, however PyO3 allows types that are only `Send` and
_not_ `Sync` to be put in `pyclass`es, which Python can then share
between threads. This is usually safe due to the GIL mediating access
to Python-space objects, so there are no data races. However, if the
initialiser passed to `get_or_init` yields control at any point to the
Python interpreter (such as by calling a method on a Python object),
this gives it a chance to block the thread and allow another Python
thread to run. The other thread could then attempt to enter the same
cache initialiser, which is a violation of its contract.
PyO3's `GILOnceCell` can protect against this failure mode, but this is
inconvenient to use for us, because it requires the GIL to be held to
access the value at all, which is problematic during `Clone` operations.
Instead, we make ourselves data-race safe by manually checking for
initialisation, calcuting the cache value ourselves, and then calling
`set` or `get_or_init` passing the already created object. While the
initialiser can still yield to the Python interpreter, the actual
setting of the cell is now atomic in this sense, and thus safe. The
downside is that more than one thread might do the same initialisation
work, but given how comparitively rare the use of Python threads is,
it's better to prioritise memory use and single-Python-threaded access
times than one-off cache population in multiple Python threads.