-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpk: some GitHub CI runners have MPK enabled but fail to run MPK code #7445
Comments
GitHub CI runners are showing some strange behavior: on certain runners (unknown which ones), the CPUID bits claim that MPK is supported, but running any MPK code (e.g., `RDPKRU`) causes a `SIGILL` crash. This change disables MPK until bytecodealliance#7445 is resolved.
We discovered that certain CPUs claim that MPK is available but then fail when running MPK instructions; MPK support was temporarily disabled in bytecodealliance#7446; this change re-enables it. Closes bytecodealliance#7445.
Finally got a reproduction that includes the |
Nice! Seems like this should be easy enough to manage by updating the supported check to require Intel CPUs and filter out AMD ones then? |
That's an option; I was also thinking about looking up how the Linux kernel sets the |
GitHub CI runners are showing some strange behavior: on certain runners (unknown which ones), the CPUID bits claim that MPK is supported, but running any MPK code (e.g., `RDPKRU`) causes a `SIGILL` crash. This change disables MPK until #7445 is resolved.
I can reproduce this issue (the SIGILL) locally & reliably on my 2nd generation EPYC and can test things out it helps anyway. |
In bytecodealliance#7446 I disabled MPK support temporarily due to failures in CI runs. Looking into this further in bytecodealliance#7445, I discovered that it is due to how `has_cpuid_bit_set` works on different x86 machines: Intel's `CPUID` instruction reports support for MPK in a certain leaf bit, AMD does it some other (unknown?) way. The CI problem boiled down to occasional runs on AMD machines that would fail with `SIGILL` because the AMD machine reported that it had MPK support when it really did not. This change fixes the issue by first checking if the CPU vendor string is `GenuineIntel` before inspecting the MPK `CPUID` leaf bit. Closes bytecodealliance#7445.
* mpk: reenable MPK support with vendor string check In #7446 I disabled MPK support temporarily due to failures in CI runs. Looking into this further in #7445, I discovered that it is due to how `has_cpuid_bit_set` works on different x86 machines: Intel's `CPUID` instruction reports support for MPK in a certain leaf bit, AMD does it some other (unknown?) way. The CI problem boiled down to occasional runs on AMD machines that would fail with `SIGILL` because the AMD machine reported that it had MPK support when it really did not. This change fixes the issue by first checking if the CPU vendor string is `GenuineIntel` before inspecting the MPK `CPUID` leaf bit. Closes #7445. * review: use `u32::from_le_bytes` to self-document the string check
When testing, there are certain CPU-dependent features that influence Cranelift's codegen (e.g., availability of AVX512 instructions). This additional CI step logs the current CPU information to aid in troubleshooting, such as the MPK-related troubleshooting over in bytecodealliance#7445. Also, if we let this run in CI for a while, we may be able to run queries on the logs to determine how often jobs run on servers with certain features enabled.
When testing, there are certain CPU-dependent features that influence Cranelift's codegen (e.g., availability of AVX512 instructions). This additional CI step logs the current CPU information to aid in troubleshooting, such as the MPK-related troubleshooting over in bytecodealliance#7445. Also, if we let this run in CI for a while, we may be able to run queries on the logs to determine how often jobs run on servers with certain features enabled. prtest:full
* ci: log CPU details when testing When testing, there are certain CPU-dependent features that influence Cranelift's codegen (e.g., availability of AVX512 instructions). This additional CI step logs the current CPU information to aid in troubleshooting, such as the MPK-related troubleshooting over in #7445. Also, if we let this run in CI for a while, we may be able to run queries on the logs to determine how often jobs run on servers with certain features enabled. prtest:full * Add Windows variant of 'lscpu' * Add MacOS variant of 'lscpu'
GitHub CI runners are showing some strange behavior: on certain runners (unknown which ones), the CPUID bits claim that MPK is supported, but running any MPK code (e.g.,
RDPKRU
) causes aSIGILL
crash. #7446 disables MPK until this is resolved.Some instances of this failure:
The text was updated successfully, but these errors were encountered: