Skip to content

Commit

Permalink
Only use the ARM NEON 32-way unrolled rANS on AArch64.
Browse files Browse the repository at this point in the history
NEON alone isn't a sufficient guard as AArch32 also has some limited
Neon capabilities.  While we could no doubt have a 32-bit alternative,
for now this is the simple fix and let aarch32 use the scalar
implementation.

Doing a 32-bit neon is a complex task and without having access to the
hardware it's pretty much impossible.  I also wouldn't have high hopes
for any significant speed gains over scalar with only half the lanes
available.

Fixes #81
  • Loading branch information
jkbonfield committed Apr 18, 2023
1 parent 5aecc6e commit b176488
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion htscodecs/rANS_static32x16pr_neon.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
*/

#include "config.h"
#ifdef __ARM_NEON
#if defined(__ARM_NEON) && defined(__aarch64__)
#include <arm_neon.h>

#include <limits.h>
Expand Down
2 changes: 1 addition & 1 deletion htscodecs/rANS_static4x16pr.c
Original file line number Diff line number Diff line change
Expand Up @@ -1006,7 +1006,7 @@ unsigned char *(*rans_dec_func(int do_simd, int order))
}
}

#elif defined(__ARM_NEON)
#elif defined(__ARM_NEON) && defined(__aarch64__)

#if defined(__linux__) || defined(__FreeBSD__)
#include <sys/auxv.h>
Expand Down

0 comments on commit b176488

Please sign in to comment.