-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD code selection and configury is incorrect for multi-arch compilation #76
Comments
Over in pysam I have been experimenting with fixing this as follows:
This pretty much fixes the issue for this multi-arch build. However it also means that e.g. a SPARC build will also say Another possibility would be to have configure.ac grep |
That's annoying. It reminds me of the old x86-64/PPC fat-binary days which had much the same problem. |
Annoying indeed. Assuming the linker can deal with a mixture of universal and single-arch object files, maybe this is a better approach:
|
@daviesrob Given I don't have a Mac and our local policy of not providing a viable remote login to a Mac for software development (despite repeated requests), this is simply something I am unable to experiment with and verify. Is it something you could test? Either that or I can start exploring AWS. I could only find x86 macs, but maybe the arm ones are in another AWS region as I'm sure they do exist somewhere. Either that or helpdesk tickets to get loan machines (and probably many more to get them to install all the required software for me). |
We have plenty of suitable macs as a group. Ideally I'd like to avoid extracting |
Argghhhh ... it's all coming back to me now. These multiarch binaries don't play well with autoconf because I think wrapping the tests and SIMD usage in We'll also need to get rid of the conditional compilation, which will mean finding another solution for the empty translation unit problem. That shouldn't be too difficult to solve though, apart from the minor annoyance of possibly leaving a few dummy symbols around. |
FYI I have now tried the “filter out
For pysam's purposes we just build this so that all the .o files are built, so this doesn't matter too much for us.
Nonetheless the two halves of the resulting libchtslib.cpython-310-darwin.so do indeed contain their respective SIMD implementation translation units, so this is good enough for pysam in the interim. Full build log at the m1-ci branch at jmarshall/pysam and cirrus.ci/…/test.log. @jkbonfield: This also demonstrates a (somewhat inconvenient) test methodology: Set up a Cirrus CI Apple Silicon build job. |
@daviesrob 's PR #78 already includes a cirrus-CI modificatoin to do multi-arch building and test it. I'm hoping it's sufficient. I'm not sure what you mean by "filter out", but I assume it's in relation to Rob's comments on how his PR works, where the code if ifdefed out rather than the Makefile selecting or deselecting specific files. Given we have a single configure line, a single Makefile, but two architectures, it feels like we must compile every object file and just cope with the fact that some aren't appropriate by them becoming minimalist when the detected CPU doesn't match. I don't really see how else to solve this given the constraints of how multi-arch works. Open to suggestions though. |
“Filter out” is a reference to #76 (comment): the pysam PR literally removes That approach solves pysam's problem in the current htslib release, but I think Rob's PR's approach will be better in general. |
[This investigation has occurred in a pysam context, which uses htslib's bundled-htscodecs build system rather than htscodecs's native build system. So this issue might equally properly be raised in htslib, but I expect any fix will span both htscodecs and htslib anyway.]
It seems that on modern macOS the usual way to build universal object files and executables is to compile and link with
-arch x86_64 -arch arm64
. This runs the compiler twice, once for each target, and then runslipo
to combine the two temporary .o files into a single multi-arch (‘universal’) .o file.This does not interact well with htslib's configure.ac SIMD probes (and presumably something similar happens with htscodecs's configure.ac):
Looking in config.log we see that the underlying x86_64 compilation has accepted
-mssse3
etc but the underlying arm64 compilation has produced errors, so the overall compilation fails too — resulting inno
. And vice versa for the Neon probe.This results in rANS_static32x16pr_neon.o et al being omitted from the build, but the
*_neon()
invocations in rANS_static4x16pr.c are included independent of the configure answer. This in turn leads to linking failure or for shared objects a failure to findrans_compress_O0_32x16_neon
at runtime, as encountered in pysam-developers/pysam#1149.It would be simple enough to fix the
*_neon()
invocation guard to respect theno
answer and avoid this crash. However this pessimizes execution time for this multi-arch build by omitting all the SIMD implementations — really for this universal ARM/X86-64 build the answer to all four checks should beyes
, with the 3 x86 SIMD implementations being used for the underlying x86_64 build and the Neon implementation being used for the underlying ARM build.The text was updated successfully, but these errors were encountered: