Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use the ARM NEON 32-way unrolled rANS on AArch64. #82

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

jkbonfield
Copy link
Collaborator

NEON alone isn't a sufficient guard as AArch32 also has some limited Neon capabilities. While we could no doubt have a 32-bit alternative, for now this is the simple fix and let aarch32 use the scalar implementation.

Doing a 32-bit neon is a complex task and without having access to the hardware it's pretty much impossible. I also wouldn't have high hopes for any significant speed gains over scalar with only half the lanes available.

Fixes #81

@clausecker
Copy link

Check if you can also adapt the NEON configure test in the configure script to only say NEON is supported if the AArch64 intrinsics are present. You can use those shown in the error messages in bug #81.

NEON alone isn't a sufficient guard as AArch32 also has some limited
Neon capabilities.  While we could no doubt have a 32-bit alternative,
for now this is the simple fix and let aarch32 use the scalar
implementation.

Doing a 32-bit neon is a complex task and without having access to the
hardware it's pretty much impossible.  I also wouldn't have high hopes
for any significant speed gains over scalar with only half the lanes
available.

Fixes samtools#81
@clausecker
Copy link

Note that “half the lanes” isn't true. Even AArch32 NEON has support for 128 bit vectors. You can test the code on affordable hardware such as a Raspberry Pi if you so desire.

@daviesrob daviesrob merged commit 479b5be into samtools:master Apr 20, 2023
@daviesrob
Copy link
Member

I was able to verify that this fixed the problem on qemu-system-arm -machine virt -cpu cortex-a15.

As a side note, testing this would be easier if the freebsd documentation covered booting arm images via qemu. There's a wiki page for arm64/QEMU, but nothing for 32 bit arm. Also, the handbook doesn't really have much to say about what you do with VM or SD card images.

For the record, I used (on a Debian bullseye host):

wget 'https://download.freebsd.org/releases/arm/armv7/ISO-IMAGES/13.2/FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img.xz'
xz -d FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img.xz
qemu-img create -f qcow2 -b FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img FreeBSD-13.2-RELEASE-arm-armv7-working.qc2
qemu-system-arm -machine virt -cpu cortex-a15 -bios /usr/lib/u-boot/qemu_arm/u-boot.bin -m 2G -nographic -display none -drive file=FreeBSD-13.2-RELEASE-arm-armv7-working.qc2,if=virtio -nic user

@clausecker
Copy link

Thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rANS_static32x16pr_neon.c doesn't build on AArch32
3 participants