Skip to content

Commit

Permalink
chore(s2n-quic-crypto): remove custom aesgcm implementation (#2186)
Browse files Browse the repository at this point in the history
  • Loading branch information
toidiu authored May 1, 2024
1 parent 7188ce4 commit b085808
Show file tree
Hide file tree
Showing 49 changed files with 95 additions and 4,193 deletions.
1 change: 0 additions & 1 deletion quic/s2n-quic-bench/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ crossbeam-channel = { version = "0.5" }
internet-checksum = "0.2"
s2n-codec = { path = "../../common/s2n-codec", features = ["testing"] }
s2n-quic-core = { path = "../s2n-quic-core", features = ["testing"] }
s2n-quic-crypto = { path = "../s2n-quic-crypto", features = ["testing"] }

[[bench]]
name = "bench"
Expand Down
14 changes: 0 additions & 14 deletions quic/s2n-quic-bench/src/crypto.rs

This file was deleted.

41 changes: 0 additions & 41 deletions quic/s2n-quic-bench/src/crypto/aes.rs

This file was deleted.

103 changes: 0 additions & 103 deletions quic/s2n-quic-bench/src/crypto/aesgcm.rs

This file was deleted.

20 changes: 0 additions & 20 deletions quic/s2n-quic-bench/src/crypto/ghash.rs

This file was deleted.

2 changes: 0 additions & 2 deletions quic/s2n-quic-bench/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
use criterion::Criterion;

mod buffer;
mod crypto;
mod frame;
mod inet;
mod packet;
Expand All @@ -14,7 +13,6 @@ mod xdp;

pub fn benchmarks(c: &mut Criterion) {
buffer::benchmarks(c);
crypto::benchmarks(c);
frame::benchmarks(c);
inet::benchmarks(c);
packet::benchmarks(c);
Expand Down
5 changes: 0 additions & 5 deletions quic/s2n-quic-crypto/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,6 @@ aws-lc-rs = { version = "1.6" }
ring = { version = "0.16", default-features = false }

[dev-dependencies]
aes = "0.8"
aes-gcm = "0.10"
bolero = "0.10"
ghash = "0.5"
hex-literal = "0.4"
insta = { version = "1", features = ["json"] }
pretty-hex = "0.4"
s2n-quic-core = { path = "../s2n-quic-core", features = ["testing"] }
44 changes: 1 addition & 43 deletions quic/s2n-quic-crypto/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,5 @@
# s2n-quic-crypto

This crate contains QUIC-optimized versions of cryptographic AEAD routines for high efficiency and performance.
This crate contains abstractions over libcrypto operations needed for implementing the QUIC protocol.

**WARNING**: **This is not meant to be for general use** outside of `s2n-quic`. There are no guarantees of a stable interface.

## Navigating the code

The code in this crate is defined in several layers of abstraction, which allow the upper layers to define algorithms in a very high level with very little `unsafe` code. Starting from the lowest level going up, the crate is composed of several modules:

### arch

Architecture-specific intrinsics enable Rust to execute special CPU instructions optimized for cryptography. This module selects the target architecture and exports the available intrinsics. However, this doesn't mean that the final CPU will actually support the instructions and executing the code will result in an `Illegal instruction` error. This means we must probe for instruction support at runtime to make it easy for applications to get the most optimized version of the code. In Rust/x86 this is accomplished with the [`is_x86_feature_detected!`](https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html) macro and the [`target_feature`](https://rust-lang.github.io/rfcs/2045-target-feature.html) attribute.

### block

Blocks define the unit of operation for block ciphers. In the case of AES, GHash, and AES-GCM this is a 128-bit value. Blocks can be operated on in "batches", which are arrays of blocks. This concept enables CPUs to look ahead of the program counter and perform computation in parallel. The batch size for AES-GCM in [AWS-LC](https://github.com/awslabs/aws-lc/blob/aed75eb04d322d101941e1377f274484f5e4f5b8/crypto/fipsmodule/modes/asm/aesni-gcm-x86_64.pl#L494) is `6`. After benchmarking several batch sizes in this code base (`4`, `6`, and `8`), this library has also selected `6` as a default.

### aes

This module contains AES implementations for each of the supported platforms. Both AES-128 and AES-256 are supported. Each implementation is generic over the `Encrypt` and `Decrypt` traits, making it easy to write generic code over the various key sizes. The AES traits also allow for interleaving instructions between rounds, which enables CPUs to perform multiple types of computation in parallel. This feature is used extensively in AES-GCM, as AES and GHash operations are performed in lockstep.

The AES implementations for x86 are a direct port of the [AWS-LC](https://github.com/awslabs/aws-lc/blob/aed75eb04d322d101941e1377f274484f5e4f5b8/crypto/fipsmodule/aes/asm/aesni-x86_64.pl) code, as that implementation has been heavily optimized over the years. Since the `aes` instruction set performs most of the heavy lifting, there isn't really any further optimization that can be done.

### ghash

This module contains GHash implementations for each of the supported platforms. Each implementation is generic over the `GHash` trait, allowing usage to be decoupled from the implementation. This enables us to experiment with various optimizations in the GHash implementation.

On x86, there are 3 implementations of the GHash algorithm: `std`, `pre_h`, and `pre_hr`. The `std` implementation is the same you would find in [AWS-LC](https://github.com/awslabs/aws-lc/blob/aed75eb04d322d101941e1377f274484f5e4f5b8/crypto/fipsmodule/modes/asm/ghash-x86_64.pl). The `pre_h` and `pre_hr`, however, work quite a bit differently. Instead of calling `gf_mul` for each block update, all of the powers of `H`, up to the maximum input size, are precomputed at key time and only at the end of the digest is a reduction performed. This allows applications to make tradeoffs between memory and CPU efficiency. `pre_hr` takes it a bit further and precomputes the `r` value as well, which doubles the required memory. However, after benchmarking the two options, this seems to make very little difference, if any.

The precomputed modes allow for statically and dynamically defined sizes. The dynamic mode enables applications to "upgrade" the efficiency of a key after deciding it's going to be worth the memory footprint.

### aesgcm

This module aims to provide a generic, platform-independent implementation of the AES-GCM mode. This means it uses all of the previously-defined traits to construct its implementation. It's also generic over the batch size and can be defined on type instantiation.

In theory, this means that adding platform support only requires implementing the AES and GHash traits. In practice, it may not hold true as there only exists a `x86` implementation.

### testing

This module contains all of the support functionality for testing implementations. Since it isn't entirely known if an implementation will be supported by the CPU until runtime, each module has a `implementations` function that returns all of the supported implementations by the runtime. This allows the caller to iterate over all of the implementations of a particular algorithm, and perform operations and make assertions.

Each module has a `test_vector` test, which uses well-known inputs and asserts that the outputs match expectations.

Each module also has a `differential_test` test, which uses [`bolero`](https://camshaft.github.io/bolero/) to generate keys, payloads, nonces, etc. and compares outputs to several well-known implementations (currently [ring](https://github.com/briansmith/ring) and [RustCrypto](https://github.com/RustCrypto)). It also asserts that decrypting the payload results in the original plaintext. This allows us to quickly identify/prevent any differences in functionality.

There are also [criterion](https://crates.io/crates/criterion) benchmarks for each of the implementations. This provides a report of each of the outcomes and ensures performance is maintained across commits.
87 changes: 87 additions & 0 deletions quic/s2n-quic-crypto/src/aead.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0

use crate::ring_aead::{Aad, LessSafeKey, Nonce, MAX_TAG_LEN, NONCE_LEN};
pub use s2n_quic_core::crypto::{packet_protection::Error, scatter};
pub type Result<T = (), E = Error> = core::result::Result<T, E>;

Expand All @@ -18,3 +19,89 @@ pub trait Aead {
tag: &Self::Tag,
) -> Result;
}

impl Aead for LessSafeKey {
type Nonce = [u8; NONCE_LEN];
type Tag = [u8; MAX_TAG_LEN];

#[inline]
#[cfg(target_os = "windows")]
fn encrypt(
&self,
nonce: &[u8; NONCE_LEN],
aad: &[u8],
payload: &mut scatter::Buffer,
) -> Result {
use s2n_codec::Encoder;

let nonce = Nonce::assume_unique_for_key(*nonce);
let aad = Aad::from(aad);

let buffer = payload.flatten();

let tag = {
let (input, _) = buffer.split_mut();

self.seal_in_place_separate_tag(nonce, aad, input)
.map_err(|_| Error::INTERNAL_ERROR)?
};

buffer.write_slice(tag.as_ref());

Ok(())
}

// use the scatter API if we're using AWS-LC
#[inline]
#[cfg(not(target_os = "windows"))]
fn encrypt(
&self,
nonce: &[u8; NONCE_LEN],
aad: &[u8],
payload: &mut scatter::Buffer,
) -> Result {
let nonce = Nonce::assume_unique_for_key(*nonce);
let aad = Aad::from(aad);

let (buffer, extra) = payload.inner_mut();
let extra_in = extra.as_deref().unwrap_or(&[][..]);
let (in_out, extra_out_and_tag) = buffer.split_mut();
let extra_out_and_tag = &mut extra_out_and_tag[..extra_in.len() + MAX_TAG_LEN];

self.seal_in_place_scatter(nonce, aad, in_out, extra_in, extra_out_and_tag)
.map_err(|_| Error::INTERNAL_ERROR)?;

Ok(())
}

#[inline]
fn decrypt(
&self,
nonce: &[u8; NONCE_LEN],
aad: &[u8],
input: &mut [u8],
tag: &[u8; MAX_TAG_LEN],
) -> Result {
let nonce = Nonce::assume_unique_for_key(*nonce);
let aad = Aad::from(aad);
let input = unsafe {
// ring requires that the input and tag be passed as a single slice
// so we extend the input slice here.
// This is only safe if they are contiguous
debug_assert_eq!(
if input.is_empty() {
(*input).as_ptr()
} else {
(&input[input.len() - 1] as *const u8).add(1)
},
(*tag).as_ptr()
);
let ptr = input.as_mut_ptr();
let len = input.len() + MAX_TAG_LEN;
core::slice::from_raw_parts_mut(ptr, len)
};
self.open_in_place(nonce, aad, input)
.map_err(|_| Error::DECRYPT_ERROR)?;
Ok(())
}
}
Loading

0 comments on commit b085808

Please sign in to comment.