Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple "aesgcm" and "aes128gcm" schemes, disable record chunking. #59

Merged
merged 3 commits into from
Mar 25, 2021

Conversation

rfk
Copy link
Contributor

@rfk rfk commented Mar 24, 2021

This is a significant refactor of the guts of the crypto code, but I think overall it
makes things easier to understand and to audit.

First, I've removed the EceWebPush trait that was previously used to share parts
of the encrypt/decrypt logic between the two schemes. The schemes are not that similar
in practice and on balance, I think the attempt to share code between them was
actually making both schemes harder to understand.

Second, I've cut all the record-chunking code out of "aesgcm". It now supports only
a single record on both encryption and decryption, in line with what the spec says
that a webpush client should support. We were already throwing errors when encountering
multiple records in "aesgcm"; this cleanup takes advantage of that fact to actually
remove the code without breaking the public API.

Finally, I've removed the record-chunking during encryption for "aes128gcm", instead
opting to support larger payloads by increasing the record size. I've also added
several layers of abstraction in the hope of making the code easier to understand -
for example there is a separate Header struct for reading/writing the header,
and a separate PlaintextRecord struct for reading/writing an individual record.

Of course "easier to understand" is subjective, but I think it's an improvement
(and I certainly understand things better as a result of having worked through it!).
Feedback and/or pushback on this is most welcome.

I'd like to try adding record chunking in back here, but as a separate PR building
atop these abstractions.

Connects to #55.

@rfk rfk force-pushed the disentangle-the-trait branch from f3371b2 to 796c464 Compare March 24, 2021 10:10
@rfk rfk requested a review from jrconlin March 24, 2021 10:11
@rfk
Copy link
Contributor Author

rfk commented Mar 24, 2021

@jrconlin I'd love to get your review on this, although I understand it's a lot. I don't have much useful advice for reviewing except that it's probably easier to view the new files in isolation rather than staring at the diff, because much code has moved around or changed indentation levels.

Copy link
Collaborator

@jrconlin jrconlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments, but looks good!

if params.rs < ECE_AES128GCM_MIN_RS {
return Err(Error::InvalidRecordSize);
}
if plaintext.len() + padding + ECE_TAG_LENGTH > params.rs as usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda wonder if it might be nice to include something like dbg!(format!("Message content too long by {:?} bytes.", params.rs - (plaintext.len() + padding + ECE_TAG_LENGTH))); to provide some insight about what went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we should never hit this branch, because the calling code set rs appropriate; but in practice, future-you or future-me will probably need such a debugging message at some point 😅 - thanks, I'll add one.

src/aes128gcm.rs Show resolved Hide resolved
src/aes128gcm.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
@rfk rfk force-pushed the disentangle-the-trait branch from 796c464 to e58c0b1 Compare March 24, 2021 22:35
@rfk rfk changed the base branch from pad-to-fixed-size to main March 24, 2021 22:37
@rfk rfk force-pushed the disentangle-the-trait branch from e58c0b1 to c0995bc Compare March 24, 2021 22:38
This is a significant refactor of the guts of the crypto code, but I think overall it
makes things easier to understand and to audit.

First, I've removed the `EceWebPush` trait that was previously used to share parts
of the encrypt/decrypt logic between the two schemes. The schemes are not that similar
in practice and on balance, I think the attempt to share code between them was
actually making both schemes harder to understand.

Second, I've cut all the record-chunking code out of "aesgcm". It now supports only
a single record on both encryption and decryption, in line with what the spec says
that a webpush client should support. We were already throwing errors when encountering
multiple records in "aesgcm"; this cleanup takes advantage of that fact to actually
remove the code without breaking the public API.

Finally, I've removed the record-chunking during encryption for "aes128gcm", instead
opting to support larger payloads by increasing the record size. I've also added
several layers of abstraction in the hope of making the code easier to understand -
for example there is a separate `Header` struct for reading/writing the header,
and a separate `PlaintextRecord` struct for reading/writing an individual record.

Of course "easier to understand" is subjective, but I think it's an improvement
(and I certainly understand things better as a result of having worked through it!).
Feedback and/or pushback on this is most welcome.

I'd like to try adding record chunking in back here, but as a separate PR building
atop these abstractions.

Connects to #55.
@rfk rfk force-pushed the disentangle-the-trait branch from c0995bc to 0e286a9 Compare March 24, 2021 22:55
@jrconlin
Copy link
Collaborator

oh, one other thing...
One thing we've adopted in services-engineering is to do a "squash and merge" at commit rather than a series of force pushes. This lets reviewers focus on just the changes for a given commit rather than dig around looking for what might be new.

@rfk
Copy link
Contributor Author

rfk commented Mar 24, 2021

One thing we've adopted in services-engineering is to do a "squash and merge" at commit rather than a series of force pushes.
This lets reviewers focus on just the changes for a given commit rather than dig around looking for what might be new.

I will keep this in mind, thanks!

@rfk
Copy link
Contributor Author

rfk commented Mar 25, 2021

FYI, I've pushed what I have so for for record chunking in #60. AFAICT it works, but I want to add some more tests.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider just deleting aesgcm ?

Comment on lines +256 to +268
/// +-----------+ content
/// | data | any length up to rs-17 octets
/// +-----------+
/// |
/// v
/// +-----------+-----+ add a delimiter octet (0x01 or 0x02)
/// | data | pad | then 0x00-valued octets to rs-16
/// +-----------+-----+ (or less on the last record)
/// |
/// v
/// +--------------------+ encrypt with AEAD_AES_128_GCM;
/// | ciphertext | final size is rs;
/// +--------------------+ the last record can be smaller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look on rustdoc without ``` ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good reminder, thanks! (a high chance it'll look like crap without that)

src/aes128gcm.rs Outdated
sequence_number: usize,
ciphertext: &[u8],
plaintext_buffer: &'a mut [u8],
) -> Result<PlaintextRecord<'a>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Result<PlaintextRecord<'a>> {
) -> Result<Self> {

I don't know if you use clippy, but this is what it always tells me to do for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, we usually have clippy running in CI to complain about these things, but looks like it's not enabled for this repo.

src/aes128gcm.rs Outdated
let iv = generate_iv_for_record(&nonce, sequence_number);
let padded_plaintext = cryptographer.aes_gcm_128_decrypt(&key, &iv, &ciphertext)?;
// Scan backwards for the first non-zero byte from the end of the data, which delimits the padding.
let last_nonzero_byte = match padded_plaintext.iter().rposition(|&b| b != 0u8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy also prefers if let Some(pos) = here and I've learned to prefer it. But it recently started going with this instead:

let last_nonzero_idx = padded_plaintext.iter().rposition(|&b| b != 0u8).ok_or(Error::DecryptPadding)?

(note that this is an index and not a byte)

src/aes128gcm.rs Outdated
Comment on lines 311 to 312
plaintext_buffer[0..last_nonzero_byte]
.copy_from_slice(&padded_plaintext[0..last_nonzero_byte]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to allow the caller to provide a buffer when you need to copy from a new vector anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really no, but it worked out better for re-use of the PlaintextRecord struct between encryption and decryption.

src/aes128gcm.rs Outdated
Comment on lines 311 to 312
plaintext_buffer[0..last_nonzero_byte]
.copy_from_slice(&padded_plaintext[0..last_nonzero_byte]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to always touch every byte, so that you don't leak too much timing info? Because this isn't very good timing defense and this copy call could go after you check whether the record is final or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not considered that as a goal here, and the ordering of this copy is entirely accidental. Should I by trying to do that?

(I'm interpreting your "this isn't very good timing defense" as meaning "you should not bother with trying to touch every byte, it wont provide meaningful defense" but want to check if that's correct. Naively, I would expect the fact that we've already decrypted and checked the AEAD tag to mean that we can safely do an early-return on invalid data here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, touching every byte means nothing when the next consumer down the chain won't maintain that discipline.

You can just reorder here to after the padding delimiter check.

Comment on lines +398 to +409
let key = cryptographer.hkdf_sha256(
salt,
&ikm,
ECE_AES128GCM_KEY_INFO.as_bytes(),
ECE_AES_KEY_LENGTH,
)?;
let nonce = cryptographer.hkdf_sha256(
salt,
&ikm,
ECE_AES128GCM_NONCE_INFO.as_bytes(),
ECE_NONCE_LENGTH,
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be optimal; if there is an hkdf-extract function separate from hkdf-expand, you can save a few iterations of the compression function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but this is at least not any worse than it used to be, so we should follow up on that in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I filed #61 to follow up)

@rfk
Copy link
Contributor Author

rfk commented Mar 25, 2021

did you consider just deleting aesgcm ?

Unfortunately we still see a lot of aesgcm traffic in the wild, ref this graph that JR took from a recent server snapshot.

@jrconlin
Copy link
Collaborator

Unfortunately, we see a LOT of aesgcm traffic still. I've reached out to various library authors to switch the default to aes128gcm, and most have. Much of the traffic, however, is probably coming from "legacy-built" systems that are fire and forget, and are now forgotten.

I'm not sure how to resolve this without a lot of complexity or user dissatisfaction.

@rfk rfk merged commit aaa12b1 into main Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants