chore(s2n-quic-crypto): remove custom aesgcm implementation (#2186)

aws · May 1, 2024 · b085808 · b085808
1 parent 7188ce4
commit b085808
Show file tree

Hide file tree

Showing 49 changed files with 95 additions and 4,193 deletions.
diff --git a/quic/s2n-quic-bench/Cargo.toml b/quic/s2n-quic-bench/Cargo.toml
@@ -15,7 +15,6 @@ crossbeam-channel = { version = "0.5" }
 internet-checksum = "0.2"
 s2n-codec = { path = "../../common/s2n-codec", features = ["testing"] }
 s2n-quic-core = { path = "../s2n-quic-core", features = ["testing"] }
-s2n-quic-crypto = { path = "../s2n-quic-crypto", features = ["testing"] }
 
 [[bench]]
 name = "bench"

diff --git a/quic/s2n-quic-bench/src/crypto.rs b/quic/s2n-quic-bench/src/crypto.rs
diff --git a/quic/s2n-quic-bench/src/crypto/aes.rs b/quic/s2n-quic-bench/src/crypto/aes.rs
diff --git a/quic/s2n-quic-bench/src/crypto/aesgcm.rs b/quic/s2n-quic-bench/src/crypto/aesgcm.rs
diff --git a/quic/s2n-quic-bench/src/crypto/ghash.rs b/quic/s2n-quic-bench/src/crypto/ghash.rs
diff --git a/quic/s2n-quic-bench/src/lib.rs b/quic/s2n-quic-bench/src/lib.rs
@@ -4,7 +4,6 @@
 use criterion::Criterion;
 
 mod buffer;
-mod crypto;
 mod frame;
 mod inet;
 mod packet;
@@ -14,7 +13,6 @@ mod xdp;
 
 pub fn benchmarks(c: &mut Criterion) {
     buffer::benchmarks(c);
-    crypto::benchmarks(c);
     frame::benchmarks(c);
     inet::benchmarks(c);
     packet::benchmarks(c);

diff --git a/quic/s2n-quic-crypto/Cargo.toml b/quic/s2n-quic-crypto/Cargo.toml
@@ -29,11 +29,6 @@ aws-lc-rs = { version = "1.6" }
 ring = { version = "0.16", default-features = false }
 
 [dev-dependencies]
-aes = "0.8"
-aes-gcm = "0.10"
-bolero = "0.10"
-ghash = "0.5"
 hex-literal = "0.4"
 insta = { version = "1", features = ["json"] }
-pretty-hex = "0.4"
 s2n-quic-core = { path = "../s2n-quic-core", features = ["testing"] }
diff --git a/quic/s2n-quic-crypto/README.md b/quic/s2n-quic-crypto/README.md
@@ -1,47 +1,5 @@
 # s2n-quic-crypto
 
-This crate contains QUIC-optimized versions of cryptographic AEAD routines for high efficiency and performance.
+This crate contains abstractions over libcrypto operations needed for implementing the QUIC protocol.
 
 **WARNING**: **This is not meant to be for general use** outside of `s2n-quic`. There are no guarantees of a stable interface.
-
-## Navigating the code
-
-The code in this crate is defined in several layers of abstraction, which allow the upper layers to define algorithms in a very high level with very little `unsafe` code. Starting from the lowest level going up, the crate is composed of several modules:
-
-### arch
-
-Architecture-specific intrinsics enable Rust to execute special CPU instructions optimized for cryptography. This module selects the target architecture and exports the available intrinsics. However, this doesn't mean that the final CPU will actually support the instructions and executing the code will result in an `Illegal instruction` error. This means we must probe for instruction support at runtime to make it easy for applications to get the most optimized version of the code. In Rust/x86 this is accomplished with the [`is_x86_feature_detected!`](https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html) macro and the [`target_feature`](https://rust-lang.github.io/rfcs/2045-target-feature.html) attribute.
-
-### block
-
-Blocks define the unit of operation for block ciphers. In the case of AES, GHash, and AES-GCM this is a 128-bit value. Blocks can be operated on in "batches", which are arrays of blocks. This concept enables CPUs to look ahead of the program counter and perform computation in parallel. The batch size for AES-GCM in [AWS-LC](https://github.com/awslabs/aws-lc/blob/aed75eb04d322d101941e1377f274484f5e4f5b8/crypto/fipsmodule/modes/asm/aesni-gcm-x86_64.pl#L494) is `6`. After benchmarking several batch sizes in this code base (`4`, `6`, and `8`), this library has also selected `6` as a default.
-
-### aes
-
-This module contains AES implementations for each of the supported platforms. Both AES-128 and AES-256 are supported. Each implementation is generic over the `Encrypt` and `Decrypt` traits, making it easy to write generic code over the various key sizes. The AES traits also allow for interleaving instructions between rounds, which enables CPUs to perform multiple types of computation in parallel. This feature is used extensively in AES-GCM, as AES and GHash operations are performed in lockstep.
-
-The AES implementations for x86 are a direct port of the [AWS-LC](https://github.com/awslabs/aws-lc/blob/aed75eb04d322d101941e1377f274484f5e4f5b8/crypto/fipsmodule/aes/asm/aesni-x86_64.pl) code, as that implementation has been heavily optimized over the years. Since the `aes` instruction set performs most of the heavy lifting, there isn't really any further optimization that can be done.
-
-### ghash
-
-This module contains GHash implementations for each of the supported platforms. Each implementation is generic over the `GHash` trait, allowing usage to be decoupled from the implementation. This enables us to experiment with various optimizations in the GHash implementation.
-
-On x86, there are 3 implementations of the GHash algorithm: `std`, `pre_h`, and `pre_hr`. The `std` implementation is the same you would find in [AWS-LC](https://github.com/awslabs/aws-lc/blob/aed75eb04d322d101941e1377f274484f5e4f5b8/crypto/fipsmodule/modes/asm/ghash-x86_64.pl). The `pre_h` and `pre_hr`, however, work quite a bit differently. Instead of calling `gf_mul` for each block update, all of the powers of `H`, up to the maximum input size, are precomputed at key time and only at the end of the digest is a reduction performed. This allows applications to make tradeoffs between memory and CPU efficiency. `pre_hr` takes it a bit further and precomputes the `r` value as well, which doubles the required memory. However, after benchmarking the two options, this seems to make very little difference, if any.
-
-The precomputed modes allow for statically and dynamically defined sizes. The dynamic mode enables applications to "upgrade" the efficiency of a key after deciding it's going to be worth the memory footprint.
-
-### aesgcm
-
-This module aims to provide a generic, platform-independent implementation of the AES-GCM mode. This means it uses all of the previously-defined traits to construct its implementation. It's also generic over the batch size and can be defined on type instantiation.
-
-In theory, this means that adding platform support only requires implementing the AES and GHash traits. In practice, it may not hold true as there only exists a `x86` implementation.
-
-### testing
-
-This module contains all of the support functionality for testing implementations. Since it isn't entirely known if an implementation will be supported by the CPU until runtime, each module has a `implementations` function that returns all of the supported implementations by the runtime. This allows the caller to iterate over all of the implementations of a particular algorithm, and perform operations and make assertions.
-
-Each module has a `test_vector` test, which uses well-known inputs and asserts that the outputs match expectations.
-
-Each module also has a `differential_test` test, which uses [`bolero`](https://camshaft.github.io/bolero/) to generate keys, payloads, nonces, etc. and compares outputs to several well-known implementations (currently [ring](https://github.com/briansmith/ring) and [RustCrypto](https://github.com/RustCrypto)). It also asserts that decrypting the payload results in the original plaintext. This allows us to quickly identify/prevent any differences in functionality.
-
-There are also [criterion](https://crates.io/crates/criterion) benchmarks for each of the implementations. This provides a report of each of the outcomes and ensures performance is maintained across commits.
diff --git a/quic/s2n-quic-crypto/src/aead.rs b/quic/s2n-quic-crypto/src/aead.rs
@@ -1,6 +1,7 @@
 // Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 // SPDX-License-Identifier: Apache-2.0
 
+use crate::ring_aead::{Aad, LessSafeKey, Nonce, MAX_TAG_LEN, NONCE_LEN};
 pub use s2n_quic_core::crypto::{packet_protection::Error, scatter};
 pub type Result<T = (), E = Error> = core::result::Result<T, E>;
 
@@ -18,3 +19,89 @@ pub trait Aead {
         tag: &Self::Tag,
     ) -> Result;
 }
+
+impl Aead for LessSafeKey {
+    type Nonce = [u8; NONCE_LEN];
+    type Tag = [u8; MAX_TAG_LEN];
+
+    #[inline]
+    #[cfg(target_os = "windows")]
+    fn encrypt(
+        &self,
+        nonce: &[u8; NONCE_LEN],
+        aad: &[u8],
+        payload: &mut scatter::Buffer,
+    ) -> Result {
+        use s2n_codec::Encoder;
+
+        let nonce = Nonce::assume_unique_for_key(*nonce);
+        let aad = Aad::from(aad);
+
+        let buffer = payload.flatten();
+
+        let tag = {
+            let (input, _) = buffer.split_mut();
+
+            self.seal_in_place_separate_tag(nonce, aad, input)
+                .map_err(|_| Error::INTERNAL_ERROR)?
+        };
+
+        buffer.write_slice(tag.as_ref());
+
+        Ok(())
+    }
+
+    // use the scatter API if we're using AWS-LC
+    #[inline]
+    #[cfg(not(target_os = "windows"))]
+    fn encrypt(
+        &self,
+        nonce: &[u8; NONCE_LEN],
+        aad: &[u8],
+        payload: &mut scatter::Buffer,
+    ) -> Result {
+        let nonce = Nonce::assume_unique_for_key(*nonce);
+        let aad = Aad::from(aad);
+
+        let (buffer, extra) = payload.inner_mut();
+        let extra_in = extra.as_deref().unwrap_or(&[][..]);
+        let (in_out, extra_out_and_tag) = buffer.split_mut();
+        let extra_out_and_tag = &mut extra_out_and_tag[..extra_in.len() + MAX_TAG_LEN];
+
+        self.seal_in_place_scatter(nonce, aad, in_out, extra_in, extra_out_and_tag)
+            .map_err(|_| Error::INTERNAL_ERROR)?;
+
+        Ok(())
+    }
+
+    #[inline]
+    fn decrypt(
+        &self,
+        nonce: &[u8; NONCE_LEN],
+        aad: &[u8],
+        input: &mut [u8],
+        tag: &[u8; MAX_TAG_LEN],
+    ) -> Result {
+        let nonce = Nonce::assume_unique_for_key(*nonce);
+        let aad = Aad::from(aad);
+        let input = unsafe {
+            // ring requires that the input and tag be passed as a single slice
+            // so we extend the input slice here.
+            // This is only safe if they are contiguous
+            debug_assert_eq!(
+                if input.is_empty() {
+                    (*input).as_ptr()
+                } else {
+                    (&input[input.len() - 1] as *const u8).add(1)
+                },
+                (*tag).as_ptr()
+            );
+            let ptr = input.as_mut_ptr();
+            let len = input.len() + MAX_TAG_LEN;
+            core::slice::from_raw_parts_mut(ptr, len)
+        };
+        self.open_in_place(nonce, aad, input)
+            .map_err(|_| Error::DECRYPT_ERROR)?;
+        Ok(())
+    }
+}