Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#pragma float_control(precise, on) doesn't work for SSE intrinsics #55713

Open
obfuscated opened this issue May 26, 2022 · 1 comment
Open

#pragma float_control(precise, on) doesn't work for SSE intrinsics #55713

obfuscated opened this issue May 26, 2022 · 1 comment
Labels
clang:headers Headers provided by Clang, e.g. for intrinsics

Comments

@obfuscated
Copy link

obfuscated commented May 26, 2022

This is the link to godbolt with the full reproducer: https://godbolt.org/z/qYczcba39

The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the __m128 types.

I've originally discovered this in clang 14.0.1.

The code to see the problem is this (compiled with -Ofast -msse4.2 -mrecip=none):

__m128 func(__m128 d, float oldLen, float newLen) {
	#pragma float_control(precise, on)
	return _mm_div_ps(
		_mm_mul_ps(d, _mm_set1_ps(oldLen)),
		_mm_set1_ps(newLen)
	);
}

__m128 func1(__m128 d, float oldLen, float newLen) {
	#pragma float_control(precise, on)
	return d*oldLen/newLen;
}

And it leads to this assembly:

.LCPI1_0:
        .long   0x3f800000                      # float 1
func(float __vector(4), float, float):                         # @func(float __vector(4), float, float)
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
        divss   xmm1, xmm2
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        ret
func1(float __vector(4), float, float):                        # @func1(float __vector(4), float, float)
        shufps  xmm1, xmm1, 0                   # xmm1 = xmm1[0,0,0,0]
        mulps   xmm0, xmm1
        shufps  xmm2, xmm2, 0                   # xmm2 = xmm2[0,0,0,0]
        divps   xmm0, xmm2
        ret

Generally the use of *(1/a) optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?

@EugeneZelenko EugeneZelenko added clang:headers Headers provided by Clang, e.g. for intrinsics and removed new issue labels May 26, 2022
@walbourn
Copy link

walbourn commented Feb 2, 2024

I'm seeing this problem still in clang v18.1.0 RC with the DirectXMath library. The only way to get my library to work on clang in Release mode is to NOT use /fp:fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:headers Headers provided by Clang, e.g. for intrinsics
Projects
None yet
Development

No branches or pull requests

3 participants