-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(simd.h): Make all-architecture matrix44::inverse() #4076
Conversation
We had an awkward situation of the simd matrix44::inverse() only being defined for Intel SSE, and in all other cases falling back on Imath::M44f::inverse. The problem with that is that it made simd.h depend either directly on ImathMatrix.h, or assume that it would be included prior to inclusion of simd.h. This was not good. This patch rewrites inverse() to eliminate all of the direct SSE intrinsics in terms of other simd wrapper functions, which means that they will be defined one way or another on all architectures. Along the way I added a new vfloat4 shuffle that combines two vectors (corresponding to _mm_shuffle_ps in SSE land), and a vfloat4 constructor from 2 float*', where the first two elements come from the first ptr and the second two elements come from the second ptr. Ultimately, the final implementation I settled on for the new inverse didn't use these (an earlier version did), but I want to keep them around anyway in case they are useful in the future. I did add a test for them. Signed-off-by: Larry Gritz <[email protected]>
Does anybody think this is a bad idea? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code is nicer to read. Since this can be an expensive and important function, has this been benchmarked (or the assembly examined) to see if it introduces a slowdown on x86_64 or ARM? Assuming it's not slower, I'm all for merging in this PR.
simd_test benchmarks these, so I know there is no slowdown (though I haven't tested on ARM). It's not especially as much faster than the Imath one as I would have hoped, so to the extent that it's ever performance critical, I'm sure it could use some improvement. But for now, I just wanted to get some implementation so we could remove the Imath header inclusion from simd.h. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to hear at least the x86_64 was benchmarked. I also like that ARM and x86_64 both (mostly) run the same code, which means they'll have roughly the same numerical precision behavior. I think this is safe to merge.
…reFoundation#4076) We had an awkward situation of the simd matrix44::inverse() only being defined for Intel SSE, and in all other cases falling back on Imath::M44f::inverse. The problem with that is that it made simd.h depend either directly on ImathMatrix.h, or assume that it would be included prior to inclusion of simd.h. This was not good. This patch rewrites inverse() to eliminate all of the direct SSE intrinsics in terms of other simd wrapper functions, which means that they will be defined one way or another on all architectures. Along the way I added a new vfloat4 shuffle that combines two vectors (corresponding to `_mm_shuffle_ps` in SSE land), and a vfloat4 constructor from 2 float*', where the first two elements come from the first ptr and the second two elements come from the second ptr. Ultimately, the final implementation I settled on for the new inverse didn't use these (an earlier version did), but I want to keep them around anyway in case they are useful in the future. I did add a test for them. Signed-off-by: Larry Gritz <[email protected]>
…reFoundation#4076) We had an awkward situation of the simd matrix44::inverse() only being defined for Intel SSE, and in all other cases falling back on Imath::M44f::inverse. The problem with that is that it made simd.h depend either directly on ImathMatrix.h, or assume that it would be included prior to inclusion of simd.h. This was not good. This patch rewrites inverse() to eliminate all of the direct SSE intrinsics in terms of other simd wrapper functions, which means that they will be defined one way or another on all architectures. Along the way I added a new vfloat4 shuffle that combines two vectors (corresponding to `_mm_shuffle_ps` in SSE land), and a vfloat4 constructor from 2 float*', where the first two elements come from the first ptr and the second two elements come from the second ptr. Ultimately, the final implementation I settled on for the new inverse didn't use these (an earlier version did), but I want to keep them around anyway in case they are useful in the future. I did add a test for them. Signed-off-by: Larry Gritz <[email protected]> Signed-off-by: Peter Kovář <[email protected]>
@lgritz do you know why lgritz@4cb956b did not remove this: // Without SSE, we need to fall back on Imath for matrix44 invert
#if !OIIO_SIMD_SSE
# include <OpenImageIO/Imath.h>
#endif while the 2.6 branch does have it removed? It's a shame 2.5 still depends on Imath.h because of that, especially since the comment says this include is for matrix44 invert which is what this ticket was for. |
I don't remember now. We could try removing it to see if anything breaks? |
lgritz@4cb956b lost this removal when resolving a merge conflict. This properly removes it. #4076 implemented matrix44::inverse() so that we could stop including Imath.h. It removed the Imath.h include in the master/dev-2.6 branch but the dev-2.5 branch kept the include, likely because of an improperly resolved merge conflict. This PR removes the include like in master. No tests are needed. As long as it compiles, this should be good. I verified this builds on arm64. Signed-off-by: Thiago Ize <[email protected]> Co-authored-by: Thiago Ize <[email protected]>
We had an awkward situation of the simd matrix44::inverse() only being defined for Intel SSE, and in all other cases falling back on Imath::M44f::inverse. The problem with that is that it made simd.h depend either directly on ImathMatrix.h, or assume that it would be included prior to inclusion of simd.h. This was not good.
This patch rewrites inverse() to eliminate all of the direct SSE intrinsics in terms of other simd wrapper functions, which means that they will be defined one way or another on all architectures.
Along the way I added a new vfloat4 shuffle that combines two vectors (corresponding to
_mm_shuffle_ps
in SSE land), and a vfloat4 constructor from 2 float*', where the first two elements come from the first ptr and the second two elements come from the second ptr. Ultimately, the final implementation I settled on for the new inverse didn't use these (an earlier version did), but I want to keep them around anyway in case they are useful in the future. I did add a test for them.