-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: clang build fixes and workarounds #16
base: master
Are you sure you want to change the base?
Conversation
experimental/bits/simd_x86.h
Outdated
const auto __d = _mm512_cvtepi8_epi32(__a); | ||
return _mm512_test_epi32_mask(__b, __b) | ||
| (_mm512_test_epi32_mask(__c, __c) << 16) | ||
| (_ULLong(_mm512_test_epi32_mask(__d, __d)) << 32); | ||
} | ||
else | ||
{ | ||
__builtin_memcpy(&__a, __mem + 16, 32); | ||
__builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my compiler flags this operation to always overflow, since __m128i __a is 16 bytes thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks broken, yes. I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this needs to be 16. Seems I have no test coverage for this case 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already some value from clang then :)
@@ -178,7 +184,7 @@ using __value_type_or_identity_t | |||
// }}} | |||
// __is_vectorizable {{{ | |||
template <typename _Tp> | |||
struct __is_vectorizable : public std::is_arithmetic<_Tp> | |||
struct __is_vectorizable : public std::is_arithmetic<std::remove_reference_t<_Tp>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? _Tp
should never be a reference. And references are not vectorizable. (Pointers might be - needs a proposal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang by some reason returns long long&& out of operator[] in my example:
https://godbolt.org/z/WJN7_M
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then whatever calls __is_vectorizable<decltype(x[0])>
is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thus a number of remove_references will be required to workaround that thing across other places.
My hello world example, which I use to make initial build possible have only 3 remaining errors:
Are there ideas on how to address that? |
@@ -1285,7 +1291,7 @@ struct __vector_type_n<_Tp, _Np, | |||
static constexpr size_t _Bytes = _Np * sizeof(_Tp) < __min_vector_size<_Tp> | |||
? __min_vector_size<_Tp> | |||
: __next_power_of_2(_Np * sizeof(_Tp)); | |||
using type [[__gnu__::__vector_size__(_Bytes)]] = _Tp; | |||
using type [[__gnu__::__vector_size__(_Bytes)]] = std::remove_reference_t<_Tp>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, _Tp
should never be a reference
@@ -3559,8 +3565,7 @@ split(const simd_mask<typename _V::simd_type::value_type, _Ap>& __x) | |||
|
|||
// }}} | |||
// split<_Sizes...>(simd) {{{ | |||
template <size_t... _Sizes, typename _Tp, typename _Ap, | |||
typename = enable_if_t<((_Sizes + ...) == simd<_Tp, _Ap>::size())>> | |||
template <size_t... _Sizes, typename _Tp, typename _Ap, typename> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the SFINAE condition breaks the spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SFINAE is there in declaration. Here it is definition. clang says it is redefinition of default argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declaration is at line 3314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks. I'll take a look
experimental/bits/simd_builtin.h
Outdated
@@ -624,7 +624,7 @@ __convert_all(_From __v) | |||
return __vector_bitcast<_FromT, decltype(__n)::value>(__vv); | |||
}; | |||
[[maybe_unused]] const auto __vi = __to_intrin(__v); | |||
auto&& __make_array = [](std::initializer_list<auto> __xs) { | |||
auto&& __make_array = [](auto __xs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't work. An initializer list argument cannot be deduced as one. But I have a fix coming up for this. Erich mentioned it yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we talked with Erich yesterday about that.
|
https://godbolt.org/z/NeTaNs apparently only Clang rejects this pattern. That doesn't prove Clang is wrong. Not sure where to start in the standard 😉 |
Try this: @@ -405,7 +405,12 @@ __is_neon_abi()
⸱
// }}}
// __make_dependent_t {{{
-template <typename, typename _Up> using __make_dependent_t = _Up;
+template <typename, typename _Up> struct __make_dependent
+{
+ using type = _Up;
+};
+template <typename _Tp, typename _Up>
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;
⸱
// }}}
// ^^^ ---- type traits ---- ^^^ |
That made this thing compile:
|
Basic arithmetic ops look working. Next broken thing is here: // _S_signmask, _S_absmask{{{
template <typename _V, typename = _VectorTraits<_V>>
static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1); clang does not consider it a constant expression |
clang does not consider it a constant expression
https://godbolt.org/z/RpWrqw
That's an unfortunate restriction in Clang. The better solution would be if
Clang were to support constant expressions involving [[gnu::vector_size(N)]]
objects. (Note that I want to propose constexpr simd for the C++23 inclusion
[1].) Until that happens we should define a macro to turn this kind of
constexpr into a const.
[1] mattkretz/std-simd-feedback#14
|
|
experimental/bits/simd_detail.h
Outdated
#if __clang__ | ||
#define _GLIBCXX_CONSTEXPR_SIMD | ||
#else | ||
#define _GLIBCXX_CONSTEXPR_SIMD constexpr | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My naming convention has been to prefix everything with _GLIBCXX_SIMD_
. I understand naming the macro "constexpr simd" is trying to explain the purpose of the macro. SIMD_CONSTEXPR_SIMD
is just confusing again, though.
<bits/c++config>
has _GLIBCXX_USE_CONSTEXPR
which seems like a good fit here. I.e. name it _GLIBCXX_SIMD_USE_CONSTEXPR
.
And define the __clang__
branch to const
.
Interesting. Maybe you can check whether the _S_absmask constants are correct?
This is, IIRC, still just a loop over |
__andnot(_S_signmask<_V>, _S_allbits<_V>); // does not work, I have no idea why
auto a =_S_signmask<_V>;
auto b = _S_allbits<_V>;
__andnot(a, b); // does work |
Last unresolved error for stdx::sin() to compile: /std-simd/experimental/bits/simd_builtin.h:1663:12: error: call to '__and' is ambiguous
return __and(__x._M_data, __y._M_data);
^~~~~
/std-simd/experimental/bits/simd.h:4569:14: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplBuiltin<std::experimental::parallelism_v2::simd_abi::_VecBuiltin<8> >::__bit_and<int, 2>' requested here
_Impl::__bit_and(__data(__x), __data(__y)));
^
/std-simd/experimental/bits/simd_math.h:576:48: note: in instantiation of member function 'std::experimental::parallelism_v2::operator&' requested here
const auto __need_sin = (__f._M_quadrant & 1) == 0;
^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
_GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<4>::__sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
^
/std-simd/experimental/bits/simd_math.h:564:6: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_Fixed<4> >' requested here
sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
_GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<32>::__sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
^
my_test.cpp:69:39: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_Fixed<32> >' requested here
print_simd("stdx::sin( fa )", stdx::sin( fa ) );
^
/std-simd/experimental/bits/simd.h:1615:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(2 * sizeof(int)))) int, void>, _Dummy = <>]
__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
^
/std-simd/experimental/bits/simd.h:1626:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, $1 = __attribute__((__vector_size__(2 * sizeof(int)))) int]
__and(_Tp __a, _Tp __b) noexcept
^
|
This smells like a compiler bug: https://godbolt.org/z/SYU0Fp |
Interesting. Clang and GCC disagree on how overload resolution works here. |
Ok, my stdx::sin example works, and provides reasonable numbers. Got a number of similar warnings though: In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:2452:25: warning: '__builtin_is_constant_evaluated' will always evaluate to 'true' in a manifestly constant-evaluated expresnstant-evaluated]
else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
^ |
Tried to compile tests. for testfile in *.cpp; do
$CXX -std=c++17 -Ivirtest -D__remove_cvref_t=std::remove_cvref_t \
-Wno-everything \
-D_GLIBCXX_SIMD_ABI=__sse $testfile
done Those tests compiled:
Those tests require __make_array workaround:
Internal compiler error
Other reasons:
2020.05.15: Updated based on __andnot fix |
BTW, just so you're aware: I'm currently working on integrating this repo into libstdc++ and the relevant repo would be mattkretz/gcc with the mkretz/simd2 branch. However, I regularly force-push into that branch, so don't switch to working there. |
Then I'll continue trying to build it here. Here is an error I get from tests
I have reproduced it with the following example:
Update: resolved. |
I am observing strange behavior with Here ABI is deduced as void foo(stdx::simd_mask<long double> a); But that thing is deduced to void foo() {
stdx::simd_mask<long double>();
} Looks like that:
|
The error you see from constructing an object of type |
At the moment I see a weirdest behavior. In the same compilation unit |
Ok, |
@mattkretz, do you happen to think of |
Next thing I have, is absence of unary not operator in clang. This fires in operators test. template <typename _Tp, size_t _Np>
_GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
__negate(_SimdWrapper<_Tp, _Np> __x) noexcept
{
return __vector_bitcast<_Tp>(!__x._M_data);
} |
I feel like I've fixed/workarounded all known compilation problems up to Haswell. Now I need to run tests. Direct compilation of tests does not work well - compiler fails on instantiation of huge test template functions: virtest/vir/test.h:1000:12: instantiating function definition {and here is a function declarartion 200KB long} Is there a way to run the test system only for one instruction set, e.g. SSE and/or one given data type e.g. double? |
Sorry, I dropped out for a few days because of other work + a public holiday here. First thing I'll do is to get my gcc branch merged back here for convenience. That'll be a patch drop, but that should help do duplicate less work. |
This is an initial list of changes needed to make it build-able by clang.
The list of fixes is not full and PR is here to make a notification about early findings, which may be useful anyway.