Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc-9 + power9: x86/sse4.1.h:1025:60: error: ‘SIMDE_SHUFFLE_VECTOR_’ was not declared in this scope #691

Closed
mr-c opened this issue Dec 31, 2020 · 5 comments

Comments

@nemequ
Copy link
Member

nemequ commented Jan 2, 2021

I can't reproduce this without knowing which flags were used; GCC 9 + POWER 9 works fine here. The basic fix for this is pretty straightforward:

diff --git a/simde/x86/sse4.1.h b/simde/x86/sse4.1.h
index 21c9c0b..cd413e8 100644
--- a/simde/x86/sse4.1.h
+++ b/simde/x86/sse4.1.h
@@ -1020,7 +1020,7 @@ simde_mm_cvtepu32_epi64 (simde__m128i a) {
 
     #if defined(SIMDE_ARM_NEON_A32V7_NATIVE)
       r_.neon_u64 = vmovl_u32(vget_low_u32(a_.neon_u32));
-    #elif defined(SIMDE_VECTOR_SCALAR) && (SIMDE_ENDIAN_ORDER == SIMDE_ENDIAN_LITTLE)
+    #elif defined(SIMDE_VECTOR_SCALAR) && defined(SIMDE_SHUFFLE_VECTOR_) && (SIMDE_ENDIAN_ORDER == SIMDE_ENDIAN_LITTLE)
       __typeof__(r_.u32) z = { 0, };
       r_.i64 = HEDLEY_REINTERPRET_CAST(__typeof__(r_.i64), SIMDE_SHUFFLE_VECTOR_(32, 16, a_.u32, z, 0, 4, 1, 6));
     #elif defined(SIMDE_CONVERT_VECTOR_)

That said, if SIMDE_SHUFFLE_VECTOR_ is not defined something is wrong, and I'd like to figure out what… it should work on GCC 4.7+, and the code should be significantly better than the portable fallback. Is it possible to get a build with the flags included (it looks like cmake, so make VERBOSE=1 instead of just make)?

@mr-c
Copy link
Collaborator Author

mr-c commented Jan 3, 2021

@nemequ Okay, I manually updated the simde-no-tests repo for the post v0.7.0 commits and redid the PR to MMSeqs2 with a commit that enabled verbose builds

The new power9 cross build is running now at https://dev.azure.com/themartinsteinegger/mmseqs2/_build/results?buildId=814&view=logs&j=b6ebde74-a63b-50c4-676d-2e6c6772f234

@mr-c
Copy link
Collaborator Author

mr-c commented Jan 3, 2021

Here is the compile line:

/usr/bin/powerpc64le-linux-gnu-g++ -DHAVE_BZLIB=1 -DHAVE_POSIX_FADVISE=1 -DHAVE_POSIX_MADVISE=1 -DHAVE_ZLIB=1 -DOPENMP=1 -I/home/vsts/work/1/s/lib/zstd/lib -I/home/vsts/work/1/s/lib/tinyexpr -I/home/vsts/work/1/s/lib/microtar -I/home/vsts/work/1/s/lib/simde -I/home/vsts/work/1/s/lib -I/home/vsts/work/1/s/lib/simd -I/home/vsts/work/1/s/lib/gzstream -I/home/vsts/work/1/s/lib/alp -I/home/vsts/work/1/s/lib/cacode -I/home/vsts/work/1/s/lib/ksw2 -I/home/vsts/work/1/s/lib/xxhash -I/home/vsts/work/1/s/build/generated -I/home/vsts/work/1/s/src/alignment -I/home/vsts/work/1/s/src/clustering -I/home/vsts/work/1/s/src/commons -I/home/vsts/work/1/s/src/multihit -I/home/vsts/work/1/s/src/prefiltering -I/home/vsts/work/1/s/src/linclust -I/home/vsts/work/1/s/src/taxonomy -I/home/vsts/work/1/s/src/util -I/home/vsts/work/1/s/src/. -O3 -DNDEBUG -fsigned-char -mcpu=power9 -mvsx -D_GNU_SOURCE=1 -std=c++1y -pedantic -Wall -Wextra -Wdisabled-optimization -fno-exceptions -Werror -fopenmp -o CMakeFiles/mmseqs-framework.dir/alignment/MsaFilter.cpp.o -c /home/vsts/work/1/s/src/alignment/MsaFilter.cpp

@nemequ
Copy link
Member

nemequ commented Jan 3, 2021

Thanks! I still can't reproduce it by just adding those flags to a SIMDe build, but I can by building with MMseqs2. That should be enough to identify the issue.

BTW, all those other warnings can be cleaned up by passing -Wno-psabi. Unfortunately it's not something we can fix in SIMDe since it is emitted on the caller side :(

@nemequ
Copy link
Member

nemequ commented Jan 3, 2021

CC @milot-mirdita

Okay, I figured it out. It's because of this: https://github.com/soedinglab/MMseqs2/blob/bad16c765aac60d84a8fde3548adbb06b34980bd/src/commons/Util.h#L50-L52

MMseqs2 is defining __has_bulitin(x) to 0. However, when SIMDe wants to check for availability of the __builtin_shuffle function, in concept it does something like this:

#if defined(__has_builtin)
  #if __has_builtin(__builtin_shuffle)
    #define HAVE_BUILTIN_SHUFFLE 1
  #endif
#elif defined(__GNUC__) && (__GNUC__ > 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7))
  #define HAVE_BUILTIN_SHUFFLE 1
#endif

The most straightforward solution is to use a namespaced version, for example:

#if defined(__has_builtin)
  #define MMSEQS2_HAS_BUILTIN(x) __has_builtin(x)
#else
  #define MMSEQS2_HAS_BUILTIN(x) (0)
#endif

Note that MMSeqs2 also does something similar for __has_attribute, so you'd want to do the same thing there, too.

I'm obviously biased, but IMHO a better solution would be to just use Hedley. Hedley already has HEDLEY_HAS_BUILTIN, HEDLEY_HAS_ATTRIBUTE, etc., macros for this reason. There are also HEDLEY_LIKELY and HEDLEY_UNLIKELY macros which are more robust than the ones in that file, plus HEDLEY_PREDICT_TRUE / HEDLEY_PREDICT_FALSE which you can provide a probability to (they'll fall back on the likely/unlikely stuff, but can use __builtin_expect_with_probability if it's available).

Also, for MAYBE_UNUSED I would suggest just using (void) var; in the code instead of adding an attribute. Casting to void is portable to other compilers (e.g., MSVC), __attribute__((__maybe_unused__)) isn't (which is why there is no HEDLEY_UNUSED macro in Hedley).

I'm going to go ahead and fix the bug in SIMDe which revealed this problem (we should be checking if SIMDE_SHUFFLE_VECTOR_ is defined prior to using it), but all that will really do is hide the problem. Since SIMDE_SHUFFLE_VECTOR_ isn't defined __builtin_shuffle won't be used anywhere in SIMDe, which is a pretty big performance hit.

Avoiding this in the future is an interesting problem. Clang still uses code like this in their manual; I filed a bug about it a while back, but they don't seem to care, so I wouldn't be surprised to see this more in the future… The only thing I can think of would be to check defined(__has_builtin) when we know it shouldn't be defined. For example,

#if defined(__has_builtin) && defined(HEDLEY_GCC_VERSION) && !HEDLEY_GCC_VERSION_CHECK(11,0,0)
HEDLEY_WARNING("__has_builtin has been improperly defined by user code")
#endif

But that has a few problems. First off, defined(HEDLEY_GCC_VERSION) isn't 100% reliable. It uses __GNUC__ to detect GCC, but a lot of compilers (including clang) also define that macro. Hedley fixes this by checking a list of macros defined by other compilers to attempt to detect the real GCC as opposed to another compiler masquerading as GCC, but there may be some compiler(s) I'm not aware of which define __GNUC__ but Hedley doesn't check for. That means a potential (though admittedly unlikely) false positive.

We would also have to keep bumping that version every so often as new versions of GCC which don't support __has_builtin are released.

Finally, this obviously relies on maintaining a list of compilers which we know don't support __has_builtin, which is a rather annoynig maintenance version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants