-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build hdf5 from source when building wheels #930
Conversation
Thanks @xmatthias
do you think it could make sense to mark this PR as WIP? |
I've done so now - but with the hint that i'll need some input to improve this (so maybe you could also add the "help wanted" label?). |
Yes, C-Blosc 1.x series does not support ARM64 (and probably will never do). C-Blosc2 do have native support for it though. However, C-Blosc2 is only backward compatible with C-Blosc1, not forward compatible, so switching to 2 means an important change in the format, and should be done in a new major version of PyTables. In case you are interested in doing this change, tell me and I can try to provide a PR for that during Xmas vacation. |
Sorry @FrancescAlted, this also include Apple Silicon? |
MacOS comes with an emulation layer for x86 on top of Apple Silicon, so C-Blosc1 should work well there. I was referring to native aarch64 on Apple Silicon. |
i think this is only true if python is running via rosetta (the emulation layer). |
Fair point. No, probably you cannot mix native binaries with emulated libraries (not a good idea anyways). |
@FrancescAlted @xmatthias it seems that both c-blosc and PyTables build properly on all the platforms supported by debian, including different arm flavors [1,2]. [1] https://buildd.debian.org/status/package.php?p=c-blosc |
Debian build seems to suggest this, yes - but yet i'm unable to build pytables without preinstalled c-blosc from the sources contained in this repository by simply running setup.py. While the resulting c-blosc version seems to be 1.21.1 in both cases - i'm not sure if it's 100% the same sourcecode (maybe debian patches something to have builds succeed?) It should also be noted that while the github actions runner is ubuntu (debian based) - the actual build happens in docker containers with manylinux (e.g. quay.io/pypa/manylinux2014_aarch64) - which are actually centOS based, and in case of aarch64, also emulated via qemu (this should not cause a problem other than reduced performance). To be fair - the solution could be as easy as using a different build-flag when building blosc - but i'm not that deep into how blosc is built (or should be built) that i'm confident in taking the right approach there. |
The only debian patch applied should be this one that only impacts the installation path AFAIK. c-blosc version currently in the PyTables repo is 1.21.0, I'm currently working to upgrade it to v1.21.1. Maybe we could test again this PR after the c-blosc update to see if something changes. |
Hey, it looks like updating C-Blosc to v1.21.1 did the trick? Interesting! |
Very good, thanks @FrancescAlted |
I had to be more precise here. C-Blosc 1.x series can be compiled for ARM, but it won't benefit from the native SIMD instructions for ARM (aka NEON). C-Blosc2 does have support for those native NEON SIMD instructions, and hence it has much better performance on ARM (that includes Apple Silicon CPUs). More info: https://www.blosc.org/posts/arm-is-becoming-a-first-class-citizen-for-blosc/ and https://www.blosc.org/posts/arm-memory-walls-followup/. |
Base on my experience you may check if binaries does not contain debug symbols. I do not see here artifacts in which I could check this. |
@Czaki the artifacts can be downloaded from this github actions run - since wheels are not built for pull requests. |
17a3205
to
b5466c3
Compare
b5466c3
to
8872dd1
Compare
it looks like both are compiled with debug symbols:
There is a discussion in If I have more time I will try to dive deeper if no one solve it. |
@Czaki thanks for this based on https://github.com/h5py/h5py/blob/1d569e69fb8d2b1abea196792bb1f8c948180534/azure-pipelines.yml#L32-L33 - it might be possible to simply provide different CFLAGS to prevent this I'll investigate this ... |
57151c2
to
379ac58
Compare
the aarch64 build also fails with the updated blosc sources (ci run). Investigating the problematic lines ( I can build blosc just fine (both before and after the upgrade) on a native aarch64 box, so in general, building it should be possible on this architecture - just not in CI with emulation (there's so far no aarch64 github runners). While i'm still investigating this in a separate branch - i have little hope that it's "easily fixable" - unless we can determine a compiler flag which helps with this "somehow". That said - compiling with lower debug level ( thanks @Czaki for the great pointer! |
The good news is - I've been able to build AARCH64 wheels from source (including the builtin blosc) here - i've tested the wheel build on CI just now on a aarch64 system and it works fine (tests pass in ~164s). I've had to completely disable sse2 for aarch64 systems. It also makes CI take ~4:30 - which is a bit concerning as it's "pretty" close to the 6h job execution limit - which would make CI fail simply due to timeout (in a "not really" fixable way). I do have a few ideas on how to approach this:
As an alternative to the second option - we could take the risk "as is" for the moment - and eventually spread the jobs out in the future (should the need arise / ci start to fail because of duration). @avalentino let me know how you'd like to approach this - and also if you'd like me to add the aarch64 changes to this PR - or if you'd prefer to keep these separate (that would mean this PR is complete). for now it's just minor changes - but i think the Environment variable for SSE2 is not optional to ensure good support for people compiling from source. |
@xmatthias very good news indeed! |
In that case i think this PR is complete. |
Thanks @xmatthias I will merge this PR after the fix to sse2 detection. |
This kind of contradicts to the above statement of
It's no problem for me to combine both in this PR - just need to know what you expect. |
hm. Maybe split aarch build on two jobs to reduce to 2h?
it looks like binaries for x86_64 are still 3 times bigger (after decompress), but it is nice that size reduction is so big. |
what are you comparing this to? Wheels from the last release (https://pypi.org/project/tables/3.6.1/#files) are ~14Mb for all linux x86_64 releases. |
@xmatthias recent changes in |
379ac58
to
ecfd2b2
Compare
This PR will add building hdf5 from source (therefore upgrading hdf5 to 1.12.1 - as requested in #912).
It'll also use the builtin blosc library for x86 wheels (this is optional and could be removed / reverted).
Wheels from my last run can be found here in the artifact download.
unfortunately, it does also include #929 - as i'm unable to build wheels from source on arm macos (and they didn't work anyway).
I could make CI more complicated to keep this part separate, but i'm not sure that will make sense.
it'll also increase runtime for the wheel CI quite some - as the build on aarch64 happnes as cross-compile, which can only utilize one core. with 2.5 hours it's however still well within range of the 6h github action runtime limit.
Building blosc from the vendored source on ARM64 is not possible at the moment.
The failure on this can be found in this run.
final errors for this are the following:
Therefore, aarch64 wheels do still include blosc-devel from the yum source (as they do now).
I suspect this has to do with the blosc version that is vendored, which might need an update to properly support compiling on arm64 - but didn't investigate further.
Important before merging:
On linux x86_64, all extension files become rather huge (16MB).
This does not happen on arm64, nor on macos - but i have no idea what causes it - as the building itself runs as expected (nothing that stands out to me in the logs).
I'm out of ideas for the moment, therefore i'm posting this here so maybe someone can jump in and give me a hint, or even provide a fix for this.
Maybe it's because of the updated hdf5 version - but i doubt that can cause such a huge spike.
Because of this, I think it might make sense to hold off merging this until we have found the root for the huge .so files. Maybe it's normal / expected - but it seems quite strange as it's only happening on one environment - where the macOS build is not significantly different (other than being another system, obviously).