-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoiding struct method call overhead of extract_pyclass_ref_mut
#3843
Comments
Generally speaking, some amount of overhead is unavoidable due pervasive shared mutability applied in Python which requires us to use interior mutability patterns to enable safe Rust code from obtaining a You might be able to reduce that overhead by opting out of dynamic borrow check by marking your pyclass as frozen using let mut state = self.state.get();
let value = self.next_state();
self.state.set(state); |
Thanks for the reply @adamreichold. Unfortunately that doesn't seem to have negated the "extract" cost, although it is now The image cut off the struct declaration: #[pyclass(frozen)]
pub struct XoshiroStruct {
state: Cell<Xoshiro256Plus>,
} note: if anyone else tries to do |
So from your flamegraphs it looks like the time is dominated by retrieval of the type object, which in general is a necessary part of the type check. I've got a couple of thoughts here:
@samuelcolvin might have hit the same thing when observing "methods overhead" in #3827 |
@davidhewitt It's set already. |
Firstly, thanks for PyO3, it's great!
This issue is being opened following a [chat post on gitter].(https://matrix.to/#/!AAhjIWoaKSExrkkhlG:gitter.im/$9gOL75NMgSzKVrKugcQUMBUFcy5QMpLvxuJafAHLieU?via=gitter.im&via=matrix.org&via=nitro.chat)
In short, consider these performance ordered cargo bench results:
and contrast that to these timeit results:
See something odd there? 🤓 Hey structs! What is going on!?!? Sure we have overhead but all of these test functions are returning a simple type to python and, yeah, the structs have to be mutable but for xoshiro_struct to go from being about twice as fast as the lazy to about 14% slower is unexpected, and to see both of the 'struct' versions leading in the Rust benches to trailing in the Python timeits is unexpected.
Running some long pytest loops on these to get enough samples to see what's going on reveals that our two structs are getting penalized for being structs:
The question is can this be avoided through some hint/annotation to short-circuit that extract/type-info stack?
Here's one of the minimal structs:
Here's the profile: pytest 2024-02-15 22.59 profile.json.gz
And here's a repo with these examples: https://github.com/Thell/struct_perf
The text was updated successfully, but these errors were encountered: