Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(simd.h): Make all-architecture matrix44::inverse() #4076

Merged
merged 1 commit into from
Jan 2, 2024

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Dec 14, 2023

We had an awkward situation of the simd matrix44::inverse() only being defined for Intel SSE, and in all other cases falling back on Imath::M44f::inverse. The problem with that is that it made simd.h depend either directly on ImathMatrix.h, or assume that it would be included prior to inclusion of simd.h. This was not good.

This patch rewrites inverse() to eliminate all of the direct SSE intrinsics in terms of other simd wrapper functions, which means that they will be defined one way or another on all architectures.

Along the way I added a new vfloat4 shuffle that combines two vectors (corresponding to _mm_shuffle_ps in SSE land), and a vfloat4 constructor from 2 float*', where the first two elements come from the first ptr and the second two elements come from the second ptr. Ultimately, the final implementation I settled on for the new inverse didn't use these (an earlier version did), but I want to keep them around anyway in case they are useful in the future. I did add a test for them.

We had an awkward situation of the simd matrix44::inverse() only being
defined for Intel SSE, and in all other cases falling back on
Imath::M44f::inverse. The problem with that is that it made simd.h
depend either directly on ImathMatrix.h, or assume that it would be
included prior to inclusion of simd.h. This was not good.

This patch rewrites inverse() to eliminate all of the direct SSE
intrinsics in terms of other simd wrapper functions, which means that
they will be defined one way or another on all architectures.

Along the way I added a new vfloat4 shuffle that combines two vectors
(corresponding to _mm_shuffle_ps in SSE land), and a vfloat4
constructor from 2 float*', where the first two elements come from the
first ptr and the second two elements come from the second
ptr. Ultimately, the final implementation I settled on for the new
inverse didn't use these (an earlier version did), but I want to keep
them around anyway in case they are useful in the future. I did add a
test for them.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz
Copy link
Collaborator Author

lgritz commented Dec 30, 2023

Does anybody think this is a bad idea?

Copy link
Collaborator

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code is nicer to read. Since this can be an expensive and important function, has this been benchmarked (or the assembly examined) to see if it introduces a slowdown on x86_64 or ARM? Assuming it's not slower, I'm all for merging in this PR.

@lgritz
Copy link
Collaborator Author

lgritz commented Jan 1, 2024

simd_test benchmarks these, so I know there is no slowdown (though I haven't tested on ARM). It's not especially as much faster than the Imath one as I would have hoped, so to the extent that it's ever performance critical, I'm sure it could use some improvement. But for now, I just wanted to get some implementation so we could remove the Imath header inclusion from simd.h.

Copy link
Collaborator

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear at least the x86_64 was benchmarked. I also like that ARM and x86_64 both (mostly) run the same code, which means they'll have roughly the same numerical precision behavior. I think this is safe to merge.

@lgritz lgritz merged commit a885ca3 into AcademySoftwareFoundation:master Jan 2, 2024
25 checks passed
@lgritz lgritz deleted the lg-invert branch January 2, 2024 06:22
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Jan 2, 2024
…reFoundation#4076)

We had an awkward situation of the simd matrix44::inverse() only being
defined for Intel SSE, and in all other cases falling back on
Imath::M44f::inverse. The problem with that is that it made simd.h
depend either directly on ImathMatrix.h, or assume that it would be
included prior to inclusion of simd.h. This was not good.

This patch rewrites inverse() to eliminate all of the direct SSE
intrinsics in terms of other simd wrapper functions, which means that
they will be defined one way or another on all architectures.

Along the way I added a new vfloat4 shuffle that combines two vectors
(corresponding to `_mm_shuffle_ps` in SSE land), and a vfloat4
constructor from 2 float*', where the first two elements come from the
first ptr and the second two elements come from the second ptr.
Ultimately, the final implementation I settled on for the new inverse
didn't use these (an earlier version did), but I want to keep them
around anyway in case they are useful in the future. I did add a test
for them.

Signed-off-by: Larry Gritz <[email protected]>
1div0 pushed a commit to 1div0/OpenImageIO that referenced this pull request Feb 24, 2024
…reFoundation#4076)

We had an awkward situation of the simd matrix44::inverse() only being
defined for Intel SSE, and in all other cases falling back on
Imath::M44f::inverse. The problem with that is that it made simd.h
depend either directly on ImathMatrix.h, or assume that it would be
included prior to inclusion of simd.h. This was not good.

This patch rewrites inverse() to eliminate all of the direct SSE
intrinsics in terms of other simd wrapper functions, which means that
they will be defined one way or another on all architectures.

Along the way I added a new vfloat4 shuffle that combines two vectors
(corresponding to `_mm_shuffle_ps` in SSE land), and a vfloat4
constructor from 2 float*', where the first two elements come from the
first ptr and the second two elements come from the second ptr.
Ultimately, the final implementation I settled on for the new inverse
didn't use these (an earlier version did), but I want to keep them
around anyway in case they are useful in the future. I did add a test
for them.

Signed-off-by: Larry Gritz <[email protected]>
Signed-off-by: Peter Kovář <[email protected]>
@ThiagoIze
Copy link
Collaborator

ThiagoIze commented Aug 9, 2024

@lgritz do you know why lgritz@4cb956b did not remove this:

// Without SSE, we need to fall back on Imath for matrix44 invert
#if !OIIO_SIMD_SSE
#   include <OpenImageIO/Imath.h>
#endif

while the 2.6 branch does have it removed? It's a shame 2.5 still depends on Imath.h because of that, especially since the comment says this include is for matrix44 invert which is what this ticket was for.

@lgritz
Copy link
Collaborator Author

lgritz commented Aug 9, 2024

I don't remember now. We could try removing it to see if anything breaks?

@ThiagoIze ThiagoIze mentioned this pull request Aug 11, 2024
5 tasks
lgritz pushed a commit that referenced this pull request Aug 11, 2024
lgritz@4cb956b
lost this removal when resolving a merge conflict. This properly removes
it.

#4076
implemented matrix44::inverse() so that we could stop including Imath.h.
It removed the Imath.h include in the master/dev-2.6 branch but the
dev-2.5 branch kept the include, likely because of an improperly
resolved merge conflict. This PR removes the include like in master.

No tests are needed. As long as it compiles, this should be good. I
verified this builds on arm64.

Signed-off-by: Thiago Ize <[email protected]>
Co-authored-by: Thiago Ize <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants