Releases: simd-everywhere/simde
v0.8.2
SIMDe 0.8.2
Summary
- Start of RISCV64 optimized implementation using the RVV1.0 vector extension! Thank you @eric900115 @howjmay @zengdage
- 62 of the ARM Neon intrinsics added in SIMDe 0.8.0 had to be removed for not exactly matching the specs and real hardware
(from the FCVTZS/FCVTMS/FCVTPS/FCVTNS families). This brings us down from 100% coverage of the NEON functions to 99.07%.
For the entire project: 126 files changed, 5522 insertions(+), 2772 deletions(-)
For just the simde
folder: 89 files changed, 4330 insertions(+), 2199 deletions(-)
Details
Implementation of Arm intrinsics
NEON
- arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 339ffe4 @mr-c
- arm neon sm3: check constant range 3d34fcd @mr-c
- arm 32 bits: native def fixes; workarounds for gcc 22900e6 @Cuda-Chen
- x86 implementations: allow _m128 access from SSE 114c3cd @mr-c
WASM intrinsics
x86 intrinsics
SVML
XOP
Arch support
arm / arm64
- arm platform: cleanup feature detection. 08c21f3 @mr-c
- arm: enable more intrinsic function for armv7 416091e @zengdage
RISCV64
- Initial Support for the RISC-V Vector Extension (RVV1.0) in ARM NEON (#1130) b4e805a @eric900115
- arm: fix some neon2rvv intrinsic function error 2a548e5 @zengdage
- arm: Add neon2rvv support in vand series intrinsics dac67f3 @howjmay
- arm: improve performance in vabd_xxx for risc-v b63ba04 @zengdage
- arm: improve performance in vhadd_xxx for risc-v a68fa90 @zengdage
Compiler Specific
Clang
- detect clang versions 18 & 19 ed4a5cd @mr-c
- arm neon clang: skip vrnd native before clang v18 e647f10 @mr-c
- apple clang arm64: ignore SHA2 be48ef8 @mr-c
Emscripten
MSVC
- x86 test msvc: really disable warning 4799,4730 487507d @mr-c
- sse2 MSVC
_mm_pause
implementaiton for x86 8d95f83 @mr-c - SSE is good enough for native m128i and m128d types & functions 9982b27 @mr-c
Testing with Docker/Podman & CI
Cirrus CI
GitHub Actions
- test Mac arm64 0080b28 @mr-c
- macos: report log if there is a configuration failure. df3e930 @mr-c
- build(deps): bump actions/checkout from 3 to 4 (#1149) 9605608 @dependabot[bot]
- build(deps): bump codecov/codecov-action from 3 to 4 25382c1 @dependabot[bot]
- codecov: use token 2c45dd4 @mr-c
- Add gcc arm 32bit armv8-a test in CI 72bde75 @Cuda-Chen
- build for AMD Buildozer version 2 9746537 @mr-c
Packit CI
Semaphore CI
Misc
- update list of fully implemented instruction sets (#1152) b568fcd @mr-c
- typo fixes from codespell 8639fef @mr-c
- README.md - move CLMUL to partial, list more of the CI.yml architectures 285b50d @Torinde
- Update README.md - link to VPCLMULQDQ; mention MSA (#1157) 517da84 @Torinde
- Update README.md (#1156) b88a66d @mr-c
- README: two more related projects 7429dff @mr-c
New Contributors
- @eric900115 made their first contribution in #1130
- @Cuda-Chen made their first contribution in #1116
- @Torinde made their first contribution in #1157
- @zengdage made their first contribution in #1172
- @howjmay made their first contribution in #1174
Full Changelog: v0.8.0...v0.8.2
v0.8.2-rc1
See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes for changes since 0.8.0
Full Changelog: v0.8.0...v0.8.2-rc1
v0.8.0
SIMDe 0.8.0
Summary
- Complete set of implementations for all NEON intrinsics have been finished, up from 56.46% in the previous release! (@yyctw @wewe5215)
- SIMDe PRs are tested using Fedora Rawhide (@junaruga)
For the entire project: 656 files changed, 202635 insertions(+), 1724 deletions(-)
For just the simde
folder: 295 files changed, 47053 insertions(+), 896 deletions(-)
X86
There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%).
Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (ER
, PF
, 4MAPS
, and 4VNNIW
) from their intrinsic list. SIMDe will retain those few implementations we already had, but this changes how our completeness statistics are calculated.
Newly added function families
- AES: 5 of 6 (83.33%)
Newly AVX512 added function families
- castph: 1 of 9 (11.11%) implemented.
- cvtus_storeu: 1 of 18 (5.56%) implemented.
- fpclass: 3 of 24 (12.50%) implemented.
- i32gather: 1 of 8 (12.50%) implemented.
- i64gather: 8 of 8 💯
- permutex: 3 of 12 (25.00%) implemented.
- rcp14: 1 of 24 (4.17%) implemented.
reduce - reduce_max: 7 of 31 (22.58%) implemented.
- reduce_min: 7 of 31 (22.58%) implemented.
- shufflehi: 1 of 7 (14.29%) implemented.
- shufflelo: 1 of 7 (14.29%) implemented.
Additions to existing families
- AVX512BW: 7 additional, 337 of 790 (42.66%)
- AVX512DQ: 5 additional, 112 total of 376 (29.79%)
- AVX512F: 48 additional, 1087 total of 2812 (38.66%)
- AVX512_FP16: 15 additional, 17 total of 1105 (1.54%)
Neon
SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!
Newly added families
- abal
- abal_high
- abd
- abdh
- abdl_high
- addhn_high
- aes
- bfdot
- bfdot_lane
- cadd_rot
- cale
- calt
- cmla_lane
- cmla_rot_lane
- copy_lane
- cvt_high
- cvt_n
- cvta
- cvtn
- cvtp
- cvtx
- cvtx_high
- div
- dupb_lane
- duph_lane
- eor3
- fmlal
- fms
- fms_lane
- fms_n
- ld2_dup
- ld2_lane
- ld3_dup
- ld3_lane
- ld4_dup
- maxnmv
- minnmv
- mla_lane
- mla_high_lane
- mls_lane
- mlsl_high_lane
- mmla
- mull_high_lane
- mull_high_n
- mulx
- mulx_lane
- pmaxnm
- pminnm
- qdmlal
- qdmlal_high
- qdmlal_high_lane
- qdmlal_high_n
- qdmlal_lane
- qdmlal_n
- qdmlsl
- qdmlsl_high
- qdmlsl_high_lane
- qdmlsl_high_n
- qdmlsl_lane
- qdmlsl_n
- qdmlslh
- qdmlslh_lane
- qdmulhh
- qdmulhh_lane
- qdmull_high
- qdmull_high_lane
- qdmull_high_n
- qdmull_lane
- qdmull_n
- qdmullh_lane
- qmovun_high
- qrdmlah
- qrdmlah_lane
- qrdmlahh
- qrdmlahh_lane
- qrdmlsh
- qrdmlsh_lane
- qrdmlshh
- qrdmlshh_lane
- qrdmulhh_lane
- qrshl
- qrshlh
- qrshrn_high_n
- qrshrnh_n
- qrshrun_high_n
- qrshrunh_n
- qshl_n
- qshlh_n
- qshluh_n
- qshrn_high_n
- qshrnh_n
- qshrun_high_n
- qshrunh_n
- raddhn
- raddhn_high
- rax
- recp
- rnd32x
- rnd32x
- rnd32x
- rnd64z
- rnda
- rndx
- rshrn_high_n
- rsubhn
- rsubhn
- set_lane
- sha1
- sha1h
- sha256
- sha512
- shll_high_n
- shrn_high_n
- sli_n
- sm3
- sm4
- sqrt
- st1_x2
- st1_x3
- st1_x4
- st1q_x2
- st1q_x3
- st1q_x4
- subhn_high
- sudot_lane
- usdot
- usdot_lane
Finally complete families
- cvtn
- mla_lane
Details
- simde-f16: improve
_Float16
usage; better INFHF/NANHF defs 8910057 @mr-c - simde_float16: prefer
__fp16
if available aba26f6 @mr-c
Implementation of Arm intrinsics
NEON
- cvtn:
vcvtnq_{s32_f32,s64_f64}
: add SSE & AVX512 optimized implementations e134cc7 @mr-c - cvtn:
vcvtnq_u32_f32
is a V8 function 8432c70 @mr-c - min: Remove non-working MMX specialization from
simde_vmin_s16
6858b92 @M-HT - shll: Extend constant range in
simde_vshll_n_XXX
intrinsics (#1064) beb1c61 @M-HT - various: Implement some f16XN types and f16 related intrinsics. (#1071) aae2245 @yyctw
- qtbl/qtbx polyfills for A32V7 a2fef9e @easyaspi314
- arm: use
SIMDE_ARCH_ARM_FMA
7198d6d @mr-c - arm neon: Complex operations from Armv8.3-a (#1077) d08d67c @wewe5215
- more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) 5e7c4d4 @yyctw
st1{,q}_*_x{2,3,4}
: initial implementation (#1082) 879d1a0 @yyctw- part 1 of implement all intrinsics supported by architecture A64 (#1090) 2eedece @yyctw
- Add AES instructions. 23adcd2 805ccd2 @yyctw
- Modified
simde_float16
tosimde_float16_t
(#1100) 8a05dc6 @yyctw - implement all intrinsics supported by architecture A64-remaining part (#1093) 018ba24 @yyctw
- add enable
vmlaq_laneq_f32
andvcvtq_n_f64_u64
c7d314b @yyctw - implement all bf16-related intrinsics (#1110) c59db7c @yyctw
- arm/neon abs: negating
INT_MIN
is undefined behavior in C/C++ c200c16 @mr-c
SVE Intrinsics
WASM intrinsics
- simd128: fix altivec_p7 version of
wasm_f64x2_pmin
96d6e53 @mr-c - simd128: add missing unsigned functions ea5e283 @mr-c
- simd128
f{32x4,64x2}_min
: add workaround for a gcc<6 issue d5d6d10 @mr-c - detect support for Relaxed SIMD mode 2e66dd4 @mr-c
- simd128/relaxed: begin MIPS implementations db8ad84 @mr-c
- relaxed: add
f{32x4,64x2}_relaxed_{min,max}
9d1a34e @mr-c - relaxed: updated names; reordered FMA operations 8cc8874 @mr-c
x86 intrinsics
SSE*
- sse: Fix issues related to MXCSR register (#1060) 653aba8 @M-HT
- sse: implement
_mm_movelh_ps
for Arm64 514564e @mr-c - sse
_mm_movemask_ps
: remove unused code fba97e4 @mr-c - sse2 mm_pause: more archs, add a basic test 692a2e8 @mr-
- sse4.1: use logical OR instead of bitwise OR in neon impl of
_mm_testnzc_si128
edd4678 @mr-c - sse4.1
_mm_testz_si128
: fix backwards short circuit logic f132275 @mr-c
AVX
- run test from #926 ce9708c @mr-c
simde_mm256_shuffle_pd
fix for natural vector size < 128 1594d7c @mr-c
AVX2
- correction of
simde_mm256_sign_epi{8,16,32}
(#1123) c376610 @Proudsalsa
AVX512
- fpclass: naive implementation 353bf5f @mr-c
- loadu: fix native detection 305f434 @mr-c
- set: add
simde_x_mm512_set_m256{,d}
67e0c50 @mr-c - gather: add MSVC native fallbacks 7b7e3f6 @mr-c
- AVX512FP16 / m512h initial support e97691c @mr-c
- fix many native aliases 75014b9 @mr-c
CLMUL
SVML
AES
MIPS MSA intrinics
Arch support
x86(-64)
arm64
Altivec
Power
- sse2,wasm simd128: skip
SIMDE_CONVERT_VECTOR_
impementations on PowerPC 4de999a @mr-c - wasm simd128: more powerpc fixes 7cb5691 @mr-c
Compiler Specific
GCC
- GCC AVX512F:
SIMDE_BUG_GCC_95399
was fixed in GCC 9.5, 10.4, 11.4, 12+ 3fa89c5 @mr-c - GCC x86/x64:
SIMDE_BUG_GCC_98521
was fixed in 10.3 edde42e @mr-c - GCC x86:
SIMDE_BUG_GCC_94482
was fixed in 8.5, 9.4, 10+ 43d86a3 @mr-c - Add workaround for GCC bug 111609 fdafd8e @M-HT
- arm neon ld2: silence warnings at -O3 on gcc risc-v 8f56628 @mr-c
- avx512 abs: refine GCC compiler checks for
_mm512{,_mask}_abs_pd
(#1118) 5405bbd @thomas-schlichter
Clang
- clang powerpc:
vec_bperm
bug was fixed in clang-14 6feb28a @mr-c - clmul: aarch64 clang has difficulties with poly64x1_t 1e1bd76 @mr-c
- aarch64: optimization bug 45541 was fixed in clang-15 7ca5712 @mr-c
- A32V7: Don't trust clang for load multiple on A32V7 927f141 @easyaspi314
- wasm:
SIMDE_BUG_CLANG_60655
is fixed in the upcoming 17.0 release 25cebbe @mr-c simde-detect-clang.h
: add clang 17 detection 923f8ac 684baa1 50d98c1 @Coeur
ClangCL
v0.8.0-rc2
See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes for changes since 0.7.6
What's Changed since RC1
- WASM Relaxed SIMD updates by @mr-c in #1112
- emcc tot: set -Wno-switch-default by @mr-c in #1115
- avx512 abs: refine GCC compiler checks for
_mm512{,_mask}_abs_pd
by @thomas-schlichter in #1118 - correction of simde_mm256_sign_epi16(). by @Proudsalsa in #1123
- apply arm64 windows workaround only on older version msvc by @Changqing-JING in #1121
- gh-actions: add clang-17 by @mr-c in #1127
- Improve performance of simde_mm512_add_epi32 by @AymenQ in #1126
- typo: XCode -> Xcode by @Coeur in #1129
- Update simde-detect-clang.h for clang 13 detection by @Coeur in #1131
- Update simde-detect-clang.h for clang 17 detection by @Coeur in #1132
- build(deps): bump ad-m/github-push-action from 0.6.0 to 0.8.0 by @dependabot in #1134
- build(deps): bump actions/setup-dotnet from 3 to 4 by @dependabot in #1135
- build(deps): bump actions/setup-python from 4 to 5 by @dependabot in #1137
- build(deps): bump github/codeql-action from 2 to 3 by @dependabot in #1138
- GitHub Actions emscripten: use older release for now by @mr-c in #1133
- build(deps): bump actions/checkout from 3 to 4 by @dependabot in #1139
- docs: explain how to target a single test by @mr-c in #1140
- arm/neon abs: negating INT_MIN is undefined behavior by @mr-c in #1141
New Contributors
- @thomas-schlichter made their first contribution in #1118
- @Proudsalsa made their first contribution in #1123
- @Changqing-JING made their first contribution in #1121
- @AymenQ made their first contribution in #1126
- @Coeur made their first contribution in #1129
- @dependabot made their first contribution in #1134
Full Changelog: v0.8.0-rc1...v0.8.0-rc2
v0.8.0-rc1
See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes
New Contributors
- @cbielow made their first contribution in #1055
- @M-HT made their first contribution in #1060
- @yyctw made their first contribution in #1071
- @Vineg made their first contribution in #1072
- @wewe5215 made their first contribution in #1077
Full Changelog: v0.7.6...v0.8.0-rc1
v0.7.6
Summary
See, I knew we should release more often!
Details
Implementation of Arm intrinsics
NEON
neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations 3a18dff @mr-c
neon/cvtn: basic implementation of a few functions fefc785 @mr-c
neon/mla_lane: initial implementation using mla+dup 554ab18 @ngzhian
neon/shl,rshl: fix avx include to unbreak amalgamated hearders 3748a9f @mr-c
neon/shll_n: make vshll_n_u32 test operational 356db0c @mr-c
neon/qabs: restore SSE2 impl for vqabsq_s8 f614843 @mr-c
x86 intrinsics
mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 51bf6f2 @mr-c
x86/sse*,avx: add additional SIMD128 implementations e28a87e @mr-c
SSE*
sse{,2,3,4.1},avx: more WASM shuffle implementations 097dd12 @mr-c
sse*,avx: add additional SIMD128 implementations e28a87e @mr-c
sse: allow native _mm_loadh_pi on MSVC x64 314452b @mr-c
AVX512
avx512: typo fix for typedef of __mmask64 e8390a3 4a9f01a @mr-c
avx512/madd: fix native alias arguments for _mm512_madd_epi16 bcf4adb @mr-c
Arch support
simde-arch: #include Hedley for setting F16C for MSVC 2022+ with AVX2 f9cf467 @mr-c
Testing with Docker/Podman & CI
tests: simde_assert_equal_{v,}f funcs were silently failing 395efd9 @mr-c
tests: Quiet another Clang < v5 warning that resurfaced d9d2b45 @mr-c
tests: audit use of HEDLEY_DIAGNOSTIC_PUSH and _POP 284c88a @mr-c
test: ignore -Wc99-extensions e264ff5 @mr-c
neon/aba: vaba_s32 test was not being run f86346a @mr-c
sve/and: the svand_n_s8_m test is incomplete, mark it as such b962f07 @mr-c
tests: combine declarations in test functions 76c7d37 @mr-c
Local testing with Docker/Podman
docker: add wasm64 target 29db539 @mr-c
Drone.io
GitHub Actions
gh-actions: confirm that all header files are installed 8d5e05a @mr-c
gh-actions: put wasm64 under CI 6702820 @mr-c
Netlify
netlify: disable for now caa0929 @mr-c
Misc
meson install: arm/neon/ld1 & x86/avx512.h 27836b1 @mr-c
Update clang version detection for 14..16 and add link 4957a9e @jan-wassenberg
v0.7.4
SIMDe 0.7.4
Summary
- Minimum meson version is now 0.54
- 40 new NEON families implemented, SVE API implementation started (14 families)
- Initial support for x86 F16C API
- Initial support for MIPS MSA API
- Initial support for Arm Scalable Vector Extensions (SVE) API
- Initial support for WASM SIMD128 API
- Initial support for the E2K (Elbrus) architecture
- MSVC has many fixes, now compiled in CI using
/ARCH:AVX
,/ARCH:AVX2
, and/ARCH:AVX512
X86
There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far.
Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)
Newly added function families
- AVX512CD: 21 of 42 (50.00%)
- AVX512VPOPCNTDQ: 18 of 18 💯
- AVX512_4VNNIW: 6 of 6 (100.00%)
- AVX512_BF16: 9 of 38 (23.68%)
- AVX512_BITALG: 24 of 24 💯
- AVX512_FP16: 2 of 1105 (0.18%)
- AVX512_VBMI2 3 of 150 (2.00%)
- AVX512_VNNI: 36 of 36 💯
- AVX_VNNI: 8 of 16 (50.00%)
Additions to existing families
- AVX512F: 579 additional, 856 total of 2660 (31.80%)
- AVX512BW: 178 additional, 335 total of 828 (40.46%)
- AVX512DQ: 77 additional, 111 total of 399 (27.82%)
- AVX512_VBMI: 9 additional, 30 total of 30 💯
- KNCNI: 113 additional, 114 total of 595 (19.16%)
- VPCLMULQDQ: 1 additional, 2 total of 2 💯
Neon
SIMDe currently implements 3745 out of 6670 (56.15%) NEON functions. If you don't count 16-bit floats and poly types, it's 3745 / 4969 (75.37%).
Newly added families
- addhn
- bcax
- cage
- cmla
- cmla_rot90
- cmla_rot180
- cmla_rot270
- fma
- fma_lane
- fma_n
- ld2
- ld4_lane
- mlal_high_n
- mlal_lane
- mls_n
- mlsl_high_n
- mlsl_lane
- mull_lane
- qdmulh_lane
- qdmulh_n
- qrdmulh_lane
- qrshrn_n
- qrshrun_n
- qshlu_n
- qshrn_n
- qshrun_n
- recpe
- recps
- rshrn_n
- rsqrte
- rsqrts
- shll_n
- shrn_n
- sqadd
- sri_n
- st2
- st2_lane
- st3_lane
- st4_lane
- subhn
- subl_high
- xar
MSA
Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.
Details
Implementation of Arm intrinsics
NEON
- aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a22c3cc @mr-c
- neon: Implement f16 types 21496f6 @Glitch18
- neon: port additional code to new style 1c744fd @nemequ
- neon: replace some more abs/labs/llabs usage with simde_math_* versions c59853a @nemequ
- neon: refactor to use different types on all targets c17957a @nemequ
- neon: test for MMX/SSE instead of x86 when choosing implementation 0366dab @nemequ
- neon/abd: add much better implementations c3ddbbe @nemequ 220db33 @ngzhian
- neon/abs: add SSE2 integer abs implementations 6396dc8 @aqrit
- neon/addhn: initial implementation e9ee066 @nemequ
- neon/add: Implement f16 functions e69239c @Glitch18
- neon/add{l,}v: SSE2/SSSE3 opts
_vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8}
8b4e375 dfffdde @mr-c - neon/{add,sub}w_high: use vmovl_high instead of vmovl + get_high b897331 @nemequ
- neon/bcax: initial implementation 96ce481 0ed3dea @Glitch18
- neon/bsl: Implement f16 functions edb75b5 @Glitch18
- neon/cage: Initial f16 implementations 20df81d @Glitch18
- neon/cagt: Implement f16 functions 452a6d3 @Glitch18
- neon/ceq: Implement f16 functions f24ab3d @Glitch18
- neon/ceqz: Implement f16 functions dd2ebf2 de301cd @Glitch18
- neon/cge: Implement f16 functions a512986 f3ad0d4 647dc12 @Glitch18
- neon/cgez: complete implementation of CGEZ family 6d86a20 @Glitch18
- neon/cgt: Add implementation of remaining functions 9930c43 @Glitch18
- neon/cgt, simd128: improve some unsigned comparisons on x86 ae6702a @nemequ
- neon/cgtz: Add implementations of remaining functions 4d749b5 @Glitch18
- neon/cle: add some x86 implementations 5906cc9 d81c7e7 @nemequ 7894c7d @Glitch18
- neon/clez: Add implementaions of scalar functions bc72880 @Glitch18
- neon/clt: Add implementations of scalar functions & SSE/AVX512 fallbacks bc636e1 6a19637 @Glitch18
- neon/cltz: Add scalar functions and natural vector fallbacks 2960ef0 @Glitch18
- neon/cmla, neon/cmla_rot{90,180,270}: check compiler versions e98152f @nemequ
- neon/cmla, neon/cmla_rot{90,180,270}: CMLA requires armv8.3+ 280faae @nemequ
- neon/cmla, neon/cmla_rot{90,180,270}, neon/fma: initial implementation 2aff4f9 @Glitch18
- neon/cnt: add x86 implementations of vcntq_s8 a558d6d @nemequ
- neon/cvt: add
__builtin_convertvector
implementations d06ea5b @nemequ - neon/cvt: add out-of-range and NaN tests 7d0e2ac @nemequ
- neon/cvt: add some faster x86 float->int/uint conversions ceaaf13 @nemequ
- neon/cvt: Add vcvt_f32_f64 and vcvt_f64_f32 implementations 8398f73 @Glitch18
- neon/cvt: cast result of float/double comparison dc215cd @ngzhian
- neon/cvt: disable some code on 32-bit x86 which uses
_mm_cvttsd_si64
48edfa9 @nemequ - neon/cvt: don't use vec_ctsl on POWER 8f9582a @nemequ
- neon/cvt: fix a couple of s390x implementations' NaN handling a8bd33d @nemequ
- neon/cvt: fix compilation with -ffast-math d1d070d @nemequ
- neon/cvt: Implement f16 functions b6a9882 @Glitch18
- neon/cvt, relaxed-simd: add work-around for GCC bug #101614 11aa006 @nemequ
- neon/cvt, simd128: fix compiler errors on PPC 965e68e @nemequ
- neon/cvt: clang bug 46844 was fixed in clang 12.0 71e03a6 @mr-c
- neon/dot_lane: add remaining implementation 3f1c1fa 4a9ca8a @Glitch18
- neon/dup_lane: Complete implementation of function family 12fb731 df320d1 @Glitch18 014ee00 9461557 @nemequ
- neon/dup_lane: use dup_n 2b4a009 @ngzhian
- neon/dup_n: Implement f16 functions 14fdf88 @Glitch18
- neon/dup_n: replace remaining functions with dup_n implementations 27a13b0 @nemequ
- neon/dupq_lane: native and portable 893db57 @ngzhian
- neon/ext: add
__builtin_shufflevector
implementation de8fe89 @ngzhian - neon/ext: add
_mm_alignr_{,e}pi8
implementations 6d28f04 @nemequ - neon/ext: clean up shuffle-based implementation f1de709 @nemequ
- neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 13ee902 @mr-c
- neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 62834fa @mr-c
- neon/fma: add a couple x86 and PPC implementations 7a2860b @nemequ
- neon/fma: add more extensive feature checking e541dd1 @nemequ
- neon/fma_lane: Implement fmaq_lane functions a77e6ad 555ef3e @Glitch18
- neon/fma_n: initial implementation 06d5a62 @nemequ dab4342 @nemequ
- neon/get_high: add
__builtin_shufflevector
optimizations 4003afa @ngzhian - neon/get_low: use
__builtin_shufflevector
if available ea3f75e @ngzhian - neon/hadd,hsub: optimization for Wasm ebe09d8 @ngzhian
- neon/ld1: add Wasm SIMD implementation a79bc15 @ngzhian
- neon/ld1_dup: native and portable (64-bit vectors), f64 debb3c8 @ngzhian 6c71aac @Glitch18
- neon/ld1_dup: split from ld1, dup_n fallbacks, WASM implementations 4c586e0 @nemequ
- neon/ld1: Implement f16 functions 6e89a9c f26f775 @Glitch18
- neon/ld1_lane: Implement remaining functions de2de8d @Glitch18 9051a51 @ngzhian
- neon/ld1q: u8_x2, u8_x3, u8_x4 341006c @ngzhian
- neon/ld1[q]_*_x2: initial implementation cd14634 @dgazzoni
- neon/ld{2,3,4}: disable -Wmaybe-uninitialized on all recent GCC e142a59 @nemequ
- neon/ld{2,3,4}: silence false positive diagnostic on GCC 7 3f737a3 @nemequ
- neon/ld2: Implement remaining functions e68f728 @Glitch18 3b3014f @ngzhian 078bb00 @nemequ 041b1bd @mr-c
- neon/ld4_lane: native and portable implementations a973cab @ngzhian 179fb79 @Glitch18 0d1ab79 @nemequ
- neon/ld4: use conformant array parameters 723a8a8 @nemequ
- neon/ld4: work around spurious warning on clang < 10 64e9db0 @nemequ
- neon/min: add SSE2 vminq_u32 & vqsubq_u32 implementation 2cf165e 117de35 @nemequ
- neon/{min,max}nm: add some headers for -ffast-math ebe5c7d @nemequ
- neon/{min,max}nm: use simde_math_* prefixed min/max functions c1607d2 @nemequ
- neon/mlal_high_n: initial implementation d6f75fa @dgazzoni
- neon/mlal_lane: initial implementation 82e36ed 2168ca0 @nemequ
- neon/mls: add
_mm_fnmadd_*
implementations of vmls*_f* 70e0c20 @nemequ - neon/mlsl_high_n: initial implementation ca1a4c3 @dgazzoni
- neon/mlsl_lane: initial implementation de78ae9 @nemequ
- neon/mls_n: initial implementation 042c6eb @nemequ
- neon/movl: improve WASM...
v0.7.4-rc3
Full Changelog: v0.7.4-rc2...v0.7.4-rc3
v0.7.4-rc2
Full Changelog: v0.7.4-rc1...v0.7.4-rc2
SIMDe 0.7.4-RC1
v0.7.4-rc1 prepare to release 0.7.4