Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SIMD on Rust stable #520

Merged
merged 24 commits into from
Mar 30, 2023
Merged

Conversation

koute
Copy link
Contributor

@koute koute commented Mar 21, 2023

This PR makes the SIMD backend work on stable Rust.

  • The nightly-only packed_simd dependency was removed.
  • A small mostly API-compatible replacement for packed_simd was added.
  • The AVX2 backend now works on stable Rust.
  • The AVX512 backend still requires nightly as those intrinsics are not yet stabilized, so on stable Rust the AVX2 backend will be used even if the target CPU supports AVX512.

@tarcieri
Copy link
Contributor

Awesome!

@koute
Copy link
Contributor Author

koute commented Mar 24, 2023

After this PR gets in I also have another PR in queue (not yet finished, but works already) where I want to switch SIMD to be entirely autodetected at runtime so that it's not necessary to compile with target_feature anymore (most likely at little to no performance loss; in fact I sped it up a little from what I can see where e.g. the vartime fixed base multiplication bench dropped from ~4.2ms to ~4.08ms on the AVX2 backend on my machine).

@tarcieri
Copy link
Contributor

Runtime autodetection sounds great!

@jcape
Copy link
Contributor

jcape commented Mar 24, 2023

Can I ask that you use either use cpufeatures (non-vendored) or allow this to be selected manually. There's no CPUID in an SGX enclave.

@koute
Copy link
Contributor Author

koute commented Mar 24, 2023

Can I ask that you use either use cpufeatures (non-vendored) or allow this to be selected manually. There's no CPUID in an SGX enclave.

Yes, I will still keep the ability to override/disable this; I'd just like autodetection to be the default.

@tarcieri
Copy link
Contributor

FWIW, I have a PR open to add AVX-512 support to cpufeatures: RustCrypto/utils#862

@koute koute force-pushed the main_stable_simd branch from e8eefb4 to a448528 Compare March 27, 2023 10:49
@koute
Copy link
Contributor Author

koute commented Mar 27, 2023

(Rebased to resolve the conflict in Cargo.toml.)

@OtaK
Copy link

OtaK commented Mar 27, 2023

If I can interject in the conversation, I have a small question.

Knowing std::simd (and packed_simd_2) supports producing WASM SIMD instructions, I was wondering if it was possible to relax this check to allow choosing the SIMD backend when the platform_family is wasm and target_features includes simd128.

If that's a trivial change, that would allow WASM ed25519 to benefit from WASM basically for free. If it's not, then nevermind.

Edit: After looking at the PR, it seems x86_64 intrinsics are directly called so this isn't a trivial change. I guess this is for another PR?

@koute
Copy link
Contributor Author

koute commented Mar 27, 2023

If I can interject in the conversation, I have a small question.

Knowing std::simd (and packed_simd_2) supports producing WASM SIMD instructions, I was wondering if it was possible to relax this check to allow choosing the SIMD backend when the platform_family is wasm and target_features includes simd128.

If that's a trivial change, that would allow WASM ed25519 to benefit from WASM basically for free. If it's not, then nevermind.

All of the current SIMD backends use explicit architecture-specific compiler intrinsics, so that wouldn't do anything besides break compilation when targetting WASM. The current dependency on packed_simd is not there because the code takes advantage of the "it's SIMD but it's portable" aspect, but because it defines a few wrapper types for the SIMD types which are more convenient to use.

You'd basically have to add a WASM-specific backend.

@jrose-signal
Copy link
Contributor

It's also important that whatever operations are used remain constant-time, which needs to be verified for every platform.

Copy link
Contributor

@rozbb rozbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and with hardly any changes! I left some commends, mostly in the packed_simd.rs file. I'm not super familiar with these intrinsics so some questions might be obvious.

src/backend/vector/packed_simd.rs Show resolved Hide resolved
src/backend/vector/packed_simd.rs Outdated Show resolved Hide resolved
src/backend/vector/packed_simd.rs Outdated Show resolved Hide resolved
src/backend/vector/packed_simd.rs Show resolved Hide resolved
src/backend/vector/packed_simd.rs Show resolved Hide resolved
src/backend/vector/packed_simd.rs Show resolved Hide resolved
src/backend/vector/packed_simd.rs Show resolved Hide resolved
@rozbb
Copy link
Contributor

rozbb commented Mar 29, 2023

@tarcieri looks good to go, modulo CI and readme updates to reflect that nightly isn't needed for AVX2 anymore.

Also I was looking for a source on why the AVX2 add, sub, mul, shl, and shr functions we use are constant time, but I got nothing.

@koute
Copy link
Contributor Author

koute commented Mar 29, 2023

looks good to go, modulo CI and readme updates to reflect that nightly isn't needed for AVX2 anymore.

Do you want me to also update those?

Also I was looking for a source on why the AVX2 add, sub, mul, shl, and shr functions we use are constant time, but I got nothing.

I can't comment on constant time-ness of the rest of the code existing code as I haven't analyzed it in that regard, but AFAIK all of these should be constant time. Although this does kind of depend on the microarchitectural details of the CPU on which we're running I think all of modern Intel and AMD CPUs should have all of these implemented as constant time,

(Also, please note that this PR doesn't really change much here; these were still used before, just indirectly through the packed_simd dependency.)

@rozbb
Copy link
Contributor

rozbb commented Mar 29, 2023

Re docs: that'd be great actually! If you don't have time, one of us will do it, probably this week.

Re constant time: Yup, agreed, and it seems that every serious curve25519 impl uses AVX2. It would just be nice to be able to point to something that justifies our use of it.

@koute
Copy link
Contributor Author

koute commented Mar 29, 2023

I've updated the README and I've also added a job to the CI to test this. (Hopefully it'll work?)

Initially I only added a job to build the code on Rust stable, but then I realized that these tests are not being run at all, and we should be able to run them, at least for AVX2 (which at this point is 10 years old, so it's a pretty safe bet to assume that the test runners should support it).

@tarcieri
Copy link
Contributor

Also I was looking for a source on why the AVX2 add, sub, mul, shl, and shr functions we use are constant time, but I got nothing.

They're pretty routinely used for cryptography. I would be fairly surprised to learn they have any data dependent timing variability, which I wouldn't expect from any arithmetic instructions on modern CPUs whether they're SIMD or not.

(They can, however, have timing variability based on thermal throttling, but that's a whole different can of worms)

@rozbb
Copy link
Contributor

rozbb commented Mar 29, 2023

@tarcieri good to merge? This might also require an MSRV bump right?

Copy link
Contributor

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I don't think it needs an MSRV bump? AVX (<512) intrinsics have been stable for quite awhile.

Maybe there could be an MSRV test with +avx2 enabled? But really the answer there is runtime feature gating as @koute mentioned.

@rozbb
Copy link
Contributor

rozbb commented Mar 29, 2023

Oh ok. I just assumed this wouldn't work with anything prior to the version that stabilized AVX2.

@rozbb
Copy link
Contributor

rozbb commented Mar 30, 2023

Oh I was under the impression that stable AVX2 was new. Apparently it was stabilized in 2018. I'll try to consolidate the test to run both builds in MSRV.

Also, I just ran AVX2 tests on my x86_64 machine and everything passess.

@rozbb rozbb merged commit 4583c47 into dalek-cryptography:main Mar 30, 2023
@rozbb
Copy link
Contributor

rozbb commented Mar 30, 2023

Thanks again @koute! This is a great addition

@koute
Copy link
Contributor Author

koute commented Mar 30, 2023

Thanks!

@koute
Copy link
Contributor Author

koute commented Apr 11, 2023

To anyone who was following this PR, my followup PR with runtime autodetection is now up: #523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants