-
-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow matrix-vector multiplication #334
Comments
Judging by a diff between the outputs of The fact that rather significant |
Thanks for tracking down the difference. I'm surprised to see Matrix implementing Copy.. Itsn't Copy intended only for types that are of trivial size/complexity? |
That's an interesting observation @jswrenn. I'm actually surprised that the optimizer does not remove the I agree removing @robsmith11 I'm not sure there is a consensus regarding which types should implement
|
I also started a branch yesterday that completely removes
The Rust documentation is clear: However, the fact remains that we want to guide users towards writing fast code, and we want their code to be fast by default. One option would be to only implement binops over references (which would present users with a "did you mean to pass a reference" error), but nalgebra already does some really clever things with re-using memory when operands are passed by moves and it would a shame to get rid of that. On the other hand, nalgebra's users are themselves inadvertently getting rid of that benefit by not realizing that |
The multiplication isn't actually being inlined. In the asm from @robsmith11's example:
I just tried replacing all instances of The un-elided |
I only just ran into this too. I didn't realise that nalgebra implements e.g. E.g. in the following minimal example I'm pretty sure most users wouldn't write functions like pub struct S {
m: nalgebra::Matrix4<f64>
}
// version where asm shows data being copied onto the stack
pub fn foo(a: &S, b: &S) -> S {
S {
m: a.m * b.m
}
}
// version with direct multiplication
pub fn bar(a: &S, b: &S) -> S {
S {
m: &a.m * &b.m
}
} |
I was trying to speed up my neural network calculations (which involve many matrix-vector multiplications and found that simply writing the multiplication with a reference to the matrix rather than the matrix itself produced a 36% reduction in run-time.
Is there any reason why
m*v
should be so much slower than&m*v
? If not, would it be possible to make them both fast?Example:
The text was updated successfully, but these errors were encountered: