-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archspec-enabled packages #1261
Comments
I've heard there is an upstream pr for conda to expose the exact set of CPU features. I don't know the status of this pr. |
Maybe this one? conda/conda#9461 |
Good find! |
I think idea 2 is closer to what we want. I don't care about Haswell or w/e. I do care about avx or avx512 |
How about using the new "feature levels" of GCC 11 and Clang 12 to define the meta-package's build strings? (I don't mean we have to wait for the compilers to be updated, just imitate the same compatibility levels.) |
Is that only for |
Yes it appears to be only x86_64. We'd need to translate those levels to specific things in archspec IIUIC. We should discuss this at the next core meeting. |
@chenghlee @wolfv It appears that archspec implements comparison operators for CPUs based on feature sets. This means you can do things like figure out if a build will run on the CPU you have and specify compatibility as things like |
Ohhhh nice. Thanks @isuruf! |
Shouldn't the "feature levels" that @chrisburr mentioned satisfy the requirement of a total ordering? That way, it would also keep the build matrix explosion to a minimum, because it would be a good start to just build for v1, v2, v3.
|
The other good thing is that these levels agree between GCC & Clang. |
Ping @isuruf re: using GCC/Clang feature levels for x86. It depends on how granular we're aiming for the configuration to be, but - aside from keeping the build matrix explosion under control - having just v2/v3 from the above list would already help in the case of conda-forge/faiss-split-feedstock#23 |
That issue was closed without further action - what now? |
Recently my colleagues (ping @serge-sans-paille @JohanMabille) have implemented a SIMD instruction set detector in xsimd: https://github.com/xtensor-stack/xsimd/blob/master/include/xsimd/config/xsimd_arch.hpp It also comes with some sort of ordering in the "best_version". It has some interesting properties:
I am not sure if it's "too late" but maybe we could use this library? Either to directly create virtual packages for the different instruction sets (avx2, sse, avx512, neon), or in a different fashion to pre-filter packages. I am very interested to ship more optimised binaries through conda-forge ... we need to save the environment :) |
Just a small note to consider as this is implemented in the future: Some users will likely be setting environments (e.g. In my experience, the gracefully-fall-back strategy works alright if one is careful enough, though clearly not a perfect solution and it seems to be causing headaches in certain places. |
I think that - like for |
Yes, the override feature will be good enough :) |
https://www.phoronix.com/news/Fedora-39-RPM-4.19 Maybe distributing the v2/v3/v4 binaries would be a great start. With adding x86-64-v3 conda-forge would instantly save some greenhouse gas usage for the Earth 🌍 |
So we have made a big move forward recently by adding the microarch feedstock, and some smaller PRs in many places. We're basically getting ready to actually start building these packages. However, we need to come up with some common sense rules to avoid CI explosion because the number of packages where the benefits are substantial is expected to be small, but there are likely highly motivated people that want to add it to feedstocks because "it must be faster". One thing for example that should rule out building for multiple architectures is if the package has some built-in runtime dispatching to microarchitectures (e.g. numpy). We at least need some documentation (and perhaps some automation?) for this. |
The very recent archspec 0.2.3 now has windows support, in large part due to @isuruf's work on this. 🥳 Not sure what else is necessary to wire this up though, just tried on a fully up-to-date environment:
|
This should be fixed by conda/conda#13641 in the next conda release. |
I started experimenting with microarch-optimized builds in conda-forge/mujoco-feedstock#45, I experienced some problems that I reported in separate issues to avoid having too much content in this one: |
Some naive questions etc.
|
Also see this comment in the package description from the original implementation:
It appears level 3 is not found on osx too in the linked builds above, but I think this answers the questions. |
FWIW, we recently added level=4 packages to the
Indeed, we simply disabled tests for level 4.
For level 4, we just added The new packages haven't been live for very long, but they seem to behave as expected. |
Not sure how relevant - but I learned recently that macOS's Rosetta2 cannot emulate AVX (v3) [at least until macOS 15=Sequoia]. For anyone using osx-64 environments on their osx-arm64 machines through Rosetta2, this might need to be taken into account. Don't just build v3 for osx-64, also make v2 builds available for the Rosetta2 users. |
As of mamba 1.15.0 mamba and micromamba can install archspec-enabled packages, see https://github.com/mamba-org/mamba/releases/tag/2024.09.20 . |
Comes from [1], [2], [3].
Building package variants for different instructions sets would be helpful for the community. For example, to support AVX for those CPUs that support it, but gracefully fall back to non-AVX variants in other CPUs (e.g. Atom). The current recommendation is to not build with AVX unless upstream handles the missing instructions at runtime.
conda/conda#9930 exposed some parts of
archspec
as a virtual package__archspec
, the output of which can be checked withconda info
:However, there's no way to leverage this information as a maintainer. What should we do?
run_constrained
lines the same way we deal with sysroots and glibc? I don't think__archspec
itself provides enough information now. How does the maintainer know which instructions are supported there?cpu_feature
metapackage that can restrict this in a better way, with as many variants as architectures I presume? This might put additional burden on the maintainer, who might need to check which architectures support which instructions.Is there a better way?
Idea 1
A Jinja function that translates instruction sets to an
__archspec
selection query:would be
If a new architecture is released and it also supports AVX would involve rebuilding packages to add the new constraints.
Idea 2
A
cpu_feature
metapackage with variants built for instructions: these packages would need to be updated often so theirrun_constrained
metadata is up-to-date with compatible architectures, but wouldn't require rebuilding downstream. How could maintainers specify multiple dependencies at the same time? Would we need to build the cartesian product of all architectures combinations?I don't think any of these ideas is good enough to be the strategy we want to pursue, but hopefully it is enough to brainstorm about it!
The text was updated successfully, but these errors were encountered: