Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: clang build fixes and workarounds #16

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

iburyl
Copy link

@iburyl iburyl commented May 7, 2020

This is an initial list of changes needed to make it build-able by clang.
The list of fixes is not full and PR is here to make a notification about early findings, which may be useful anyway.

const auto __d = _mm512_cvtepi8_epi32(__a);
return _mm512_test_epi32_mask(__b, __b)
| (_mm512_test_epi32_mask(__c, __c) << 16)
| (_ULLong(_mm512_test_epi32_mask(__d, __d)) << 32);
}
else
{
__builtin_memcpy(&__a, __mem + 16, 32);
__builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my compiler flags this operation to always overflow, since __m128i __a is 16 bytes thing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks broken, yes. I'll take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs to be 16. Seems I have no test coverage for this case 😉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already some value from clang then :)

@@ -178,7 +184,7 @@ using __value_type_or_identity_t
// }}}
// __is_vectorizable {{{
template <typename _Tp>
struct __is_vectorizable : public std::is_arithmetic<_Tp>
struct __is_vectorizable : public std::is_arithmetic<std::remove_reference_t<_Tp>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? _Tp should never be a reference. And references are not vectorizable. (Pointers might be - needs a proposal)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang by some reason returns long long&& out of operator[] in my example:
https://godbolt.org/z/WJN7_M

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then whatever calls __is_vectorizable<decltype(x[0])> is incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thus a number of remove_references will be required to workaround that thing across other places.

@iburyl
Copy link
Author

iburyl commented May 7, 2020

My hello world example, which I use to make initial build possible have only 3 remaining errors:

In file included from /std-simd/experimental/simd:59:
/std-simd/experimental/bits/simd_builtin.h:1144:9: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
        return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
/std-simd/experimental/bits/simd_builtin.h:1171:11: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
          return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
In file included from my_test.cpp:16:
In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:710:4: error: incomplete type 'std::experimental::parallelism_v2::_MaskImplX86Mixin' named in nested name specifier
                        __make_dependent_t<_Tp, _MaskImplX86Mixin>::__to_bits(
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_x86.h:347:8: note: forward declaration of 'std::experimental::parallelism_v2::_MaskImplX86Mixin'
struct _MaskImplX86Mixin;
       ^
/std-simd/experimental/bits/simd_x86.h:4250:3: warning: 'memcpy' will always overflow; destination buffer has size 16, but size argument is 32 [-Wfortify-source]
                __builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32);
                ^
1 warning and 3 errors generated.

Are there ideas on how to address that?

@@ -1285,7 +1291,7 @@ struct __vector_type_n<_Tp, _Np,
static constexpr size_t _Bytes = _Np * sizeof(_Tp) < __min_vector_size<_Tp>
? __min_vector_size<_Tp>
: __next_power_of_2(_Np * sizeof(_Tp));
using type [[__gnu__::__vector_size__(_Bytes)]] = _Tp;
using type [[__gnu__::__vector_size__(_Bytes)]] = std::remove_reference_t<_Tp>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, _Tp should never be a reference

@@ -3559,8 +3565,7 @@ split(const simd_mask<typename _V::simd_type::value_type, _Ap>& __x)

// }}}
// split<_Sizes...>(simd) {{{
template <size_t... _Sizes, typename _Tp, typename _Ap,
typename = enable_if_t<((_Sizes + ...) == simd<_Tp, _Ap>::size())>>
template <size_t... _Sizes, typename _Tp, typename _Ap, typename>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the SFINAE condition breaks the spec

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFINAE is there in declaration. Here it is definition. clang says it is redefinition of default argument

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declaration is at line 3314

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks. I'll take a look

@@ -624,7 +624,7 @@ __convert_all(_From __v)
return __vector_bitcast<_FromT, decltype(__n)::value>(__vv);
};
[[maybe_unused]] const auto __vi = __to_intrin(__v);
auto&& __make_array = [](std::initializer_list<auto> __xs) {
auto&& __make_array = [](auto __xs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't work. An initializer list argument cannot be deduced as one. But I have a fix coming up for this. Erich mentioned it yesterday.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we talked with Erich yesterday about that.

@mattkretz
Copy link
Member

My hello world example, which I use to make initial build possible have only 3 remaining errors:

__make_dependent_t was a clever hack to make the second template argument a dependent type in this context. Maybe I should find a better fix. But that probably requires a lot of code motion and less architecture abstraction into different files. So I'd like to avoid that. Let me find out whether clang is actually correct to reject this code.

@mattkretz
Copy link
Member

https://godbolt.org/z/NeTaNs apparently only Clang rejects this pattern. That doesn't prove Clang is wrong. Not sure where to start in the standard 😉

@mattkretz
Copy link
Member

Try this:

@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^

@iburyl
Copy link
Author

iburyl commented May 7, 2020

Try this:

@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^

That made this thing compile:

#include <experimental/simd>
int main()
{
  std::experimental::fixed_size_simd<int, 5> a;
  std::experimental::fixed_size_simd<float, 32> b;

  return 0;
}

@iburyl
Copy link
Author

iburyl commented May 8, 2020

Basic arithmetic ops look working.

Next broken thing is here:

// _S_signmask, _S_absmask{{{
template <typename _V, typename = _VectorTraits<_V>>
static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1);

clang does not consider it a constant expression
https://godbolt.org/z/RpWrqw

@mattkretz
Copy link
Member

mattkretz commented May 8, 2020 via email

@iburyl
Copy link
Author

iburyl commented May 8, 2020

stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.

stdx::fabs works as expected with FP.

stdx::sin has compilation issues

Comment on lines 311 to 315
#if __clang__
#define _GLIBCXX_CONSTEXPR_SIMD
#else
#define _GLIBCXX_CONSTEXPR_SIMD constexpr
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My naming convention has been to prefix everything with _GLIBCXX_SIMD_. I understand naming the macro "constexpr simd" is trying to explain the purpose of the macro. SIMD_CONSTEXPR_SIMD is just confusing again, though.
<bits/c++config> has _GLIBCXX_USE_CONSTEXPR which seems like a good fit here. I.e. name it _GLIBCXX_SIMD_USE_CONSTEXPR.
And define the __clang__ branch to const.

@mattkretz
Copy link
Member

stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.

Interesting. Maybe you can check whether the _S_absmask constants are correct?

stdx::fabs works as expected with FP.

This is, IIRC, still just a loop over std::fabs.

@iburyl
Copy link
Author

iburyl commented May 8, 2020

_S_allbits and _S_signmask are fine, but _S_absmask is not

__andnot(_S_signmask<_V>, _S_allbits<_V>); // does not work, I have no idea why
auto a =_S_signmask<_V>;
auto b = _S_allbits<_V>;
__andnot(a, b); // does work

@iburyl
Copy link
Author

iburyl commented May 8, 2020

Last unresolved error for stdx::sin() to compile:

/std-simd/experimental/bits/simd_builtin.h:1663:12: error: call to '__and' is ambiguous
    return __and(__x._M_data, __y._M_data);
           ^~~~~
/std-simd/experimental/bits/simd.h:4569:14: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplBuiltin<std::experimental::parallelism_v2::simd_abi::_VecBuiltin<8> >::__bit_and<int, 2>' requested here
      _Impl::__bit_and(__data(__x), __data(__y)));
             ^
/std-simd/experimental/bits/simd_math.h:576:48: note: in instantiation of member function 'std::experimental::parallelism_v2::operator&' requested here
      const auto __need_sin = (__f._M_quadrant & 1) == 0;
                                               ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<4>::__sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
/std-simd/experimental/bits/simd_math.h:564:6: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_Fixed<4> >' requested here
            sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
            ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<32>::__sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
my_test.cpp:69:39: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_Fixed<32> >' requested here
  print_simd("stdx::sin( fa )", stdx::sin( fa ) );
                                      ^
/std-simd/experimental/bits/simd.h:1615:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(2 * sizeof(int)))) int, void>, _Dummy = <>]
__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
^
/std-simd/experimental/bits/simd.h:1626:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, $1 = __attribute__((__vector_size__(2 * sizeof(int)))) int]
__and(_Tp __a, _Tp __b) noexcept
^

@mattkretz
Copy link
Member

_S_allbits and _S_signmask are fine, but _S_absmask is not

This smells like a compiler bug: https://godbolt.org/z/SYU0Fp

@mattkretz
Copy link
Member

Last unresolved error for stdx::sin() to compile:

Interesting. Clang and GCC disagree on how overload resolution works here.

@iburyl
Copy link
Author

iburyl commented May 13, 2020

Ok, my stdx::sin example works, and provides reasonable numbers.

Got a number of similar warnings though:

In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:2452:25: warning: '__builtin_is_constant_evaluated' will always evaluate to 'true' in a manifestly constant-evaluated expresnstant-evaluated]
    else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
                        ^

@iburyl
Copy link
Author

iburyl commented May 13, 2020

Tried to compile tests.
Running via regular make quickly breaks with errors, thus I've applied direct approach to see wider picture for time-being.

for testfile in *.cpp; do
    $CXX -std=c++17 -Ivirtest -D__remove_cvref_t=std::remove_cvref_t \
        -Wno-everything \
        -D_GLIBCXX_SIMD_ABI=__sse $testfile
done

Those tests compiled:

  • abi.cpp
  • abi_fixed_size.cpp
  • bitset_conversions.cpp
  • mask.cpp
  • mask_conversions.cpp
  • proposed.cpp
  • sfinae.cpp
  • simd.cpp
  • specialmath.cpp
  • uninit.cpp

Those tests require __make_array workaround:

  • where.cpp

Internal compiler error

  • casts.cpp - internal compiler error
  • integer_operators.cpp - internal compiler error
  • loadstore.cpp - internal compiler error
  • math.cpp - internal compiler error

Other reasons:

  • experimental.cpp - no member named 'apply' in 'where_expression' (tests compilation issue?)
  • operators.cpp - some issues with __vector_bitcast<_Tp>(!__x._M_data);

2020.05.15: Updated based on __andnot fix
2020.05.19: Updated based on __make_array workaround and long double issue resolved

@mattkretz
Copy link
Member

BTW, just so you're aware: I'm currently working on integrating this repo into libstdc++ and the relevant repo would be mattkretz/gcc with the mkretz/simd2 branch. However, I regularly force-push into that branch, so don't switch to working there.
But at some point I'll make a patch drop back into this repo. It's fine if we continue working like this. I just want to let you know that I probably won't be able to merge this PR as is, since I'll obviously try to get as many bug fixes as possible into the GCC branch ASAP.

@iburyl
Copy link
Author

iburyl commented May 14, 2020

Then I'll continue trying to build it here.
But I need some insightful help.

Here is an error I get from tests

./../experimental/bits/simd_builtin.h:2724:14: error: no matching function for call to '__andnot'
      return __andnot(__x._M_data, _Abi::template __implicit_mask<_Tp>());
             ^~~~~~~~
...
./../experimental/bits/simd.h:1743:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('__attribute__((__vector_size__(4 * sizeof(int)))) int' (vector of 4 'int' values) vs. 'std::experimental::parallelism_v2::_SimdWrapper<int, 3, void>')
__andnot(_Tp __a, _Tp __b) noexcept
^
./../experimental/bits/simd.h:1717:1: note: candidate template ignored: requirement '!integral_constant<bool, true>::value' was not satisfied [with _Tp = __attribute__((__vector_size__(4 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(4 * sizeof(int)))) int, void>]
__andnot(_Tp __a, typename _TVT::type __b) noexcept
^

I have reproduced it with the following example:

  using fixed_i32_3 = stdx::simd<int, stdx::simd_abi::_VecBuiltin<12>>;
  fixed_i32_3 i3 = 1;
  where(!(i3 > 0), i3) = 2;

Update: resolved.

@iburyl
Copy link
Author

iburyl commented May 15, 2020

I am observing strange behavior with long double so far.

Here ABI is deduced as scalar:

void foo(stdx::simd_mask<long double> a);

But that thing is deduced to _VecBuiltin(16) and it fails :

void foo() {
    stdx::simd_mask<long double>();
}

Looks like that:

my_test_2.cpp:19:10: error: call to implicitly-deleted default constructor of 'stdx::simd_mask<long double>'
    stdx::simd_mask<long double>();
         ^
/std-simd/experimental/bits/simd.h:4061:3: note: explicitly defaulted function was implicitly deleted here
  simd_mask() = default;
  ^
/std-simd/experimental/bits/simd.h:4038:19: note: default constructor of 'simd_mask<long double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>>' is implicitly deleted because base class '_SimdTraits<long double, _VecBuiltin<16>>::_MaskBase' (aka 'std::experimental::parallelism_v2::_UnsupportedBase') has a deleted default constructor
class simd_mask : public _SimdTraits<_Tp, _Abi>::_MaskBase
                  ^
/std-simd/experimental/bits/simd.h:546:3: note: '_UnsupportedBase' has been explicitly marked deleted here
  _UnsupportedBase() = delete;
  ^
1 error generated.

@mattkretz
Copy link
Member

I am observing strange behavior with long double so far.

The error you see from constructing an object of type simd<long double, _VecBuiltin<16>> is expected. That's the result from using an unsupported ABI tag. The question is really why simd_abi::compatible<long double> would not be scalar.
https://github.com/VcDevel/std-simd/blob/master/experimental/bits/simd.h#L2304-L2307 would choose _VecBuiltin<16> if sizeof(long double) <= 8. That'd be strange for Linux.

@iburyl
Copy link
Author

iburyl commented May 16, 2020

At the moment I see a weirdest behavior. In the same compilation unit sizeof(long double) is some times evaluated to 8 and some times to 16, and this mess destroys the logic for long double. Checking with Erich, if it is a known compiler bug.

@iburyl
Copy link
Author

iburyl commented May 18, 2020

Ok, long double issue turned to be a problem of my custom compiler build. Resolved that on my side.

@iburyl
Copy link
Author

iburyl commented May 18, 2020

@mattkretz, do you happen to think of __make_array workaround?

@iburyl
Copy link
Author

iburyl commented May 19, 2020

Next thing I have, is absence of unary not operator in clang.
https://godbolt.org/z/r6eeZr

This fires in operators test.
Goes down to: simd_builtin.h

  template <typename _Tp, size_t _Np>
  _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
  __negate(_SimdWrapper<_Tp, _Np> __x) noexcept
  {
    return __vector_bitcast<_Tp>(!__x._M_data);
  }

@iburyl
Copy link
Author

iburyl commented May 20, 2020

I feel like I've fixed/workarounded all known compilation problems up to Haswell.
Skylake have some missing builtins, e.g. __builtin_ia32_blendmb_512_mask

Now I need to run tests.

Direct compilation of tests does not work well - compiler fails on instantiation of huge test template functions: virtest/vir/test.h:1000:12: instantiating function definition {and here is a function declarartion 200KB long}

Is there a way to run the test system only for one instruction set, e.g. SSE and/or one given data type e.g. double?

@mattkretz
Copy link
Member

Sorry, I dropped out for a few days because of other work + a public holiday here. First thing I'll do is to get my gcc branch merged back here for convenience. That'll be a patch drop, but that should help do duplicate less work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants