Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing compilation with optimisation: Set optimisation flags when building shuffle.c #436

Merged
merged 3 commits into from
Jan 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion blosc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -172,11 +172,17 @@ if(COMPILER_SUPPORT_SSE2)
set_source_files_properties(
shuffle-sse2.c bitshuffle-sse2.c blosclz.c fastcopy.c
PROPERTIES COMPILE_FLAGS "/arch:SSE2")
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS "/arch:SSE2")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using COMPILE_OPTIONS here since COMPILE_FLAGS is deprecated (see https://cmake.org/cmake/help/latest/prop_tgt/COMPILE_FLAGS.html).

Also COMPILE_OPTIONS is a ;-separated list which fits set_property as opposed to COMPILE_FLAGS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your new commits, is really necessary to add SSE2 / AVX2 support to shuffle.c. This was not necessary before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed or it needs other modifications:

shuffle.c is using SHUFFLE_USE_AVX2 which is defined as:

c-blosc2/blosc/shuffle.h

Lines 28 to 30 in e204359

#if defined(SHUFFLE_AVX2_ENABLED) && defined(__AVX2__)
#define SHUFFLE_USE_AVX2
#endif

So the definition needs -mavx2 flag to have __AVX2__ defined. Same applies for SSE2/NEON/ALTIVEC.

To avoid this, it looks to require to change the conditional include of [bit]shuffle-*.h and the way get_shuffle_implementation is implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that. I just want to make sure we don't abuse of SIMD flags.

endif()
else()
set_source_files_properties(
shuffle-sse2.c bitshuffle-sse2.c blosclz.c fastcopy.c
PROPERTIES COMPILE_FLAGS -msse2)
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS -msse2)
endif()

# Define a symbol for the shuffle-dispatch implementation
Expand All @@ -191,10 +197,16 @@ if(COMPILER_SUPPORT_AVX2)
set_source_files_properties(
shuffle-avx2.c bitshuffle-avx2.c
PROPERTIES COMPILE_FLAGS "/arch:AVX2")
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS "/arch:AVX2")
else()
set_source_files_properties(
shuffle-avx2.c bitshuffle-avx2.c
PROPERTIES COMPILE_FLAGS -mavx2)
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS -mavx2)
endif()

# Define a symbol for the shuffle-dispatch implementation
Expand All @@ -208,11 +220,17 @@ if(COMPILER_SUPPORT_NEON)
set_source_files_properties(
shuffle-neon.c bitshuffle-neon.c
PROPERTIES COMPILE_FLAGS "-flax-vector-conversions")
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS "-flax-vector-conversions")
if(CMAKE_SYSTEM_PROCESSOR STREQUAL armv7l)
# Only armv7l needs special -mfpu=neon flag; aarch64 doesn't.
set_source_files_properties(
shuffle-neon.c bitshuffle-neon.c
PROPERTIES COMPILE_FLAGS "-mfpu=neon -flax-vector-conversions")
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS "-mfpu=neon -flax-vector-conversions")
endif()
# Define a symbol for the shuffle-dispatch implementation
# so it knows NEON is supported even though that file is
Expand All @@ -222,8 +240,12 @@ if(COMPILER_SUPPORT_NEON)
APPEND PROPERTY COMPILE_DEFINITIONS SHUFFLE_NEON_ENABLED)
endif()
if(COMPILER_SUPPORT_ALTIVEC)
set_source_files_properties(shuffle-altivec.c bitshuffle-altivec.c
set_source_files_properties(
shuffle-altivec.c bitshuffle-altivec.c
PROPERTIES COMPILE_FLAGS -DNO_WARN_X86_INTRINSICS)
set_property(
SOURCE shuffle.c
APPEND PROPERTY COMPILE_OPTIONS -DNO_WARN_X86_INTRINSICS)

# Define a symbol for the shuffle-dispatch implementation
# so it knows ALTIVEC is supported even though that file is
Expand Down
8 changes: 4 additions & 4 deletions blosc/shuffle.c
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ __cpuidex(int32_t cpuInfo[4], int32_t function_id, int32_t subfunction_id) {

// GCC folks added _xgetbv in immintrin.h starting in GCC 9
// See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71659
#if defined(__i386__) && !(defined(_IMMINTRIN_H_INCLUDED) && (BLOSC_GCC_VERSION >= 900))
#if !(defined(_IMMINTRIN_H_INCLUDED) && (BLOSC_GCC_VERSION >= 900)) && !defined(__IMMINTRIN_H)
/* Reads the content of an extended control register.
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
*/
Expand All @@ -160,7 +160,7 @@ _xgetbv(uint32_t xcr) {
);
return ((uint64_t)edx << 32) | eax;
}
#endif // defined(__i386__) && !(defined(_IMMINTRIN_H_INCLUDED) && (BLOSC_GCC_VERSION >= 900))
#endif // !(defined(_IMMINTRIN_H_INCLUDED) && (BLOSC_GCC_VERSION >= 900)) && !defined(__IMMINTRIN_H)
#endif /* defined(_MSC_VER) */

#ifndef _XCR_XFEATURE_ENABLED_MASK
Expand Down Expand Up @@ -204,7 +204,7 @@ static blosc_cpu_features blosc_get_cpu_features(void) {
bool ymm_state_enabled = false;
//bool zmm_state_enabled = false; // commented this out for avoiding an 'unused variable' warning

#if defined(__i386__) && defined(_XCR_XFEATURE_ENABLED_MASK)
#if defined(_XCR_XFEATURE_ENABLED_MASK)
if (xsave_available && xsave_enabled_by_os && (
sse2_available || sse3_available || ssse3_available
|| sse41_available || sse42_available
Expand All @@ -219,7 +219,7 @@ static blosc_cpu_features blosc_get_cpu_features(void) {
restored as well as all of zmm16-zmm31 and the opmask registers. */
//zmm_state_enabled = (xcr0_contents & 0x70) == 0x70;
}
#endif /* defined(__i386__) && defined(_XCR_XFEATURE_ENABLED_MASK) */
#endif /* defined(_XCR_XFEATURE_ENABLED_MASK) */

#if defined(BLOSC_DUMP_CPU_INFO)
printf("Shuffle CPU Information:\n");
Expand Down