-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OnceCell
to cache Python operations
#12942
Conversation
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 10392333478Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sensible and, putting aside all the memory saving benefits, it makes the code more readable as we don't need to constantly call borrow
or borrow_mut
. Hopefully we can bump to PyO3 0.22 and reorganize things soon.
I had previously approved this PR but it seems to be failing CI now. Any clue as to why @jakelishman ? |
3c765fd
to
53d225f
Compare
We were previously using `RefCell` to handle the interior mutability for the Python-operation caches. This works fine, but has two problems: - `RefCell<Py<PyAny>>` is two pointers wide - we had to do heavier dynamic runtime checks for the borrow status on every access, including simple gets. This commit moves the caching to instead use `OnceCell`. The internal tracking for a `OnceCell<T>` is just whether an internal `Option<T>` is in the `Some` variant, and since `Option<Py<PyAny>>` will use the null-pointer optimisation for space, this means that `OnceCell<Py<PyAny>>` is only a single pointer wide. Since it's not permissible to take out a mutable reference to the interior, there is also no dynamic runtime checking---just the null-pointer check that the cell is initialised---so access is faster as well. We can still clear the cache if we hold a mutable reference to the `OnceCell`, and since every method that can invalidate the cache also necessarily changes Rust-space, they certainly require mutable references to the cell, making it safe. There is a risk here, though, in `OnceCell::get_or_init`. This function is not re-entrant; one cannot attempt the initialisation while another thread is initialising it. Typically this is not an issue, since the type isn't `Sync`, however PyO3 allows types that are only `Send` and _not_ `Sync` to be put in `pyclass`es, which Python can then share between threads. This is usually safe due to the GIL mediating access to Python-space objects, so there are no data races. However, if the initialiser passed to `get_or_init` yields control at any point to the Python interpreter (such as by calling a method on a Python object), this gives it a chance to block the thread and allow another Python thread to run. The other thread could then attempt to enter the same cache initialiser, which is a violation of its contract. PyO3's `GILOnceCell` can protect against this failure mode, but this is inconvenient to use for us, because it requires the GIL to be held to access the value at all, which is problematic during `Clone` operations. Instead, we make ourselves data-race safe by manually checking for initialisation, calcuting the cache value ourselves, and then calling `set` or `get_or_init` passing the already created object. While the initialiser can still yield to the Python interpreter, the actual setting of the cell is now atomic in this sense, and thus safe. The downside is that more than one thread might do the same initialisation work, but given how comparitively rare the use of Python threads is, it's better to prioritise memory use and single-Python-threaded access times than one-off cache population in multiple Python threads.
53d225f
to
a19d92f
Compare
Ray: it's ok, it's just that #12943 merged in the meantime that had a logical but not syntactic conflict. I've rebased this PR to account for the additional cache interaction added in that PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the quick fixes!
Summary
We were previously using
RefCell
to handle the interior mutability for the Python-operation caches. This works fine, but has two problems:RefCell<Py<PyAny>>
is two pointers wideThis commit moves the caching to instead use
OnceCell
. The internal tracking for aOnceCell<T>
is just whether an internalOption<T>
is in theSome
variant, and sinceOption<Py<PyAny>>
will use the null-pointer optimisation for space, this means thatOnceCell<Py<PyAny>>
is only a single pointer wide. Since it's not permissible to take out a mutable reference to the interior, there is also no dynamic runtime checking---just the null-pointer check that the cell is initialised---so access is faster as well.We can still clear the cache if we hold a mutable reference to the
OnceCell
, and since every method that can invalidate the cache also necessarily changes Rust-space, they certainly require mutable references to the cell, making it safe.There is a risk here, though, in
OnceCell::get_or_init
. This function is not re-entrant; one cannot attempt the initialisation while another thread is initialising it. Typically this is not an issue, since the type isn'tSync
, however PyO3 allows types that are onlySend
and notSync
to be put inpyclass
es, which Python can then share between threads. This is usually safe due to the GIL mediating access to Python-space objects, so there are no data races. However, if the initialiser passed toget_or_init
yields control at any point to the Python interpreter (such as by calling a method on a Python object), this gives it a chance to block the thread and allow another Python thread to run. The other thread could then attempt to enter the same cache initialiser, which is a violation of its contract.PyO3's
GILOnceCell
can protect against this failure mode, but this is inconvenient to use for us, because it requires the GIL to be held to access the value at all, which is problematic duringClone
operations. Instead, we make ourselves data-race safe by manually checking for initialisation, calcuting the cache value ourselves, and then callingset
orget_or_init
passing the already created object. While the initialiser can still yield to the Python interpreter, the actual setting of the cell is now atomic in this sense, and thus safe. The downside is that more than one thread might do the same initialisation work, but given how comparitively rare the use of Python threads is, it's better to prioritise memory use and single-Python-threaded access times than one-off cache population in multiple Python threads.Details and comments
GILOnceCell
would likely be better here, but it doesn't deriveClone
, nor would that make sense, since it uses the GIL to mediate even getter access. When we're able to move to PyO3 0.22 and disable thepy-clone
feature of it, we might be able to reorganise this a little bit to avoid needing to handle the GIL-atomic initialisation manually.I measured this as saving 165MB (around 10%) in resident-set size from this benchmark:
over its parent commit. That's the same microbenchmark we used in #12730. That should have a (small) impact on copy and circuit-build performance due to needing to move around less memory, and access times on the Python operation cache should be smaller too, though that shouldn't be in the critical path.
The biggest memory problem remaining in the
CircuitData
model (by far) is the parameter storage at the moment - in this particular benchmark, we spend something around 559MB storing the parameters of the 9,000,000 parametric gates, of which 15% is malloc overhead (on macOS, at least, but probably most Unix-likes and not dissimilar on Windows) and within the actual allocation, about 70% is just padding bytes.