Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Aes128Gcm support via rust-ece crate delegation #23

Closed
wants to merge 24 commits into from
Closed

Adding Aes128Gcm support via rust-ece crate delegation #23

wants to merge 24 commits into from

Conversation

tiesselune
Copy link

@tiesselune tiesselune commented Jan 31, 2021

About

This pull request addresses #3 by using the ece crate, which is maintained by Mozilla, to implement the Aes128Gcm encoding scheme. This way, it can be easily updated and benefit from security reviews and patches on the ece crate and the crate can focus on the VAPID part of things.

For consistency purposes, it entirely delegates ECE encoding to the ece crate, including the AesGcm scheme, which implies some refactoring.

The actual cryptographic work being implemented and tested in the ece crate, the cryptographic functions and tests in the http_ece module can be entirely delegated.

Changes summary:

  • Re-wrote the encrypt function in the HttpEce struct implementation to support both encodings and use the ece crate, using the same interface to avoid any breaking change
  • Removed cryptographic structs, functions and related tests which are no longer needed
  • Added new tests to test crypto headers for both AesGcm and Aes128Gcm
  • Changed the headers property of the WebPushPayload struct from a Vec<(&'static str,String)> to a HashMap<String,String> for ease of integration and consistency with the ece crate implementation (this might be a breaking change, since WebPushPayload is published at the crate level; it needs more investigation)
  • Removed http-ece from the tags of the crate, since it is now delegated to another one.

Remaining work

This pull request is a draft and it needs a few things before it can be considered:

  • It relies on the merging of this pull request on the ece crate, which I'm working on with the maintainer ( for now, cargo.toml uses my fork)
  • The potential breaking change on the WebPushPayload struct needs to be addressed
  • I need to do some real-world testing with major push services to ensure full compliance.
  • Optionnal The merging of this pull request on the ece crate would allow integrating with its headers function without extra allocation or hashing

@pimeys
Copy link
Owner

pimeys commented Feb 1, 2021

What? Somebody else wrote an ece crate? Oh nice! The amount of tears and pain to write it with ring and to understand the RFCs are now gone. This is sweet, waiting for the finished version :D

@pimeys
Copy link
Owner

pimeys commented Feb 1, 2021

Please poke me if you're done and I'm not answering soon enough. A new sprint starting and I'm in the middle of the long meetings. Also, please update the README and see that the rustdocs reflect the current state of the crate.

@pimeys
Copy link
Owner

pimeys commented Feb 1, 2021

For breaking changes, we go to v0.8 anyhow because of Tokio 1 and (if somebody does it) a possible support for async-std runtime. So if you need to break any apis, now's the time.

@tiesselune
Copy link
Author

Great news, my ece crate pull request has been approved and merged, and the new version published to crates.io which means that this pull request is soon to get out of WIP state! 🥳

src/http_ece.rs Outdated Show resolved Hide resolved
src/message.rs Outdated Show resolved Hide resolved
@tiesselune
Copy link
Author

tiesselune commented Feb 3, 2021

Ok @pimeys , responding here cause the two concerns you presented above are actually linked. I first responded with some suggestions on how to fix the problem then dove into it and found a far simpler approach that is just a bit suboptimal in regard to code duplication between the ece crate and this one, but would be better performance-wise. So I deleted my previous answers since they're no longer relevant and don't deserve to be read. 😉

The headers function from the AesGcmEncryptedBlock struct that we get from the encrypt_aesgcm function is the culprit here; it's passing self by value, effectively consuming the block and forcing a copy to avoid a double move; it's also producing the suboptimal hashmap. The good thing is: we don't need it. So we can get rid of the cloning and the hashmap in one go by re-introducing a variant of get_headers function that you originally wrote for this even if the ece crates has a function that essentially does the same thing (but with Strings and consuming Self, forcing a copy somewhere) That's definitely a design flaw that would be fixed in another pull request for the ece crate 😉, but we can circumvent it for now quite easily.

Anyway, I'm going to submit a new commit for both those issues quite soon ;)

EDIT: For reference, I've just suggested these changes to the ece crate maintainers, and might tackle it if they want me to

@pimeys
Copy link
Owner

pimeys commented Feb 3, 2021

Great. Probably this kind of optimization doesn't matter for most users, but before we went bankrupt with the company I was writing this crate for, we used it to send thousands and thousands notifications per second. All extra allocations start to show up, and, well, it's very cool if you can make it as fast as possible. :D

I can accept this of course as-is too, if there's no way around some of the added computation.

@tiesselune
Copy link
Author

tiesselune commented Feb 4, 2021

Hello @pimeys Just to let you know that I wrote a version that avoids the allocation/hashing, but in order to fix this properly, I've opened a new pull request on the ece crate for that new issue.

There would not be any performance difference, just a question of using the functions provided by the ece crate instead of re-coding them, but basically the code is the same.

Anyway, there would be no breaking change and moving from that implementation to the one in the ece crate would be very easy and could be done at any time.

(I still need to run some real-world tests before merging)

@tiesselune
Copy link
Author

Hi again @pimeys ! So I have good news and bad news.

Good news: I tested the Aes128Gcm back-end with real world notifications and it works!

Bad news: for it to work, I had to downgrade hyper, hyper-tls and tokio. If I run it with the current versions, I got the following error: there is no reactor running, must be called from the context of a Tokio 1.x runtime in the hyper::Client::send method that is called in WebPushClient::send. With the old versions, I got random stack overflows, still somewhere in the WebPushClient::send request.

Would you be familiar with the Tokio problem? Does the problem occur on your end with the current master branch ?

@pimeys
Copy link
Owner

pimeys commented Feb 6, 2021

I'm trying to debug this on your branch, but I don't really have proper subscription info or anything, so I go with the following:

{
  "endpoint": "asdf",
  "keys": {
    "p256dh": "asdf",
    "auth": "asdf"
  }
}

I get:

    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/examples/simple_send -f info.json`
Error: Unspecified

which means that the Tokio 1.1 and Hyper 0.14 work just as they should. The error you get is that you try to run something that expects Tokio 1 in a Tokio 0.2 or 0.3 reactor. That's unfortunate how Tokio forces you to use their machinery and exact versions. Have you tried cargo clean and really be sure you run the push system at least with Tokio version 1.0 or later?

@tiesselune
Copy link
Author

Hey, thanks for the quick reply. I was investigating and I came to the same conclusion. My example uses actix and it seems that the current actix_web version has a dependency that itself uses tokio 0.2. I'll need to try it with the beta version of actix.

@pimeys
Copy link
Owner

pimeys commented Feb 6, 2021

This is the reason I'd like this crate, a2 and fcm all to be independent on runtimes. Maybe refactoring it using https://docs.rs/async-h1/2.3.1/async_h1/ would already make it work much better with every possible system...

@pimeys
Copy link
Owner

pimeys commented Feb 6, 2021

Btw. I started hacking away trying to make this crate runtime-independent. This is not really in scope of this ticket, but I'm in Matrix: @oh_lawd:nauk.io if you want to chat more about the general direction of rust-web-push :)

@tiesselune tiesselune marked this pull request as ready for review February 8, 2021 19:27
@tiesselune
Copy link
Author

I've tested against

  • Google's push service
  • Mozilla's push service
  • Microsoft's push service

And I am proud to announce that it works! I think this is ready for merging or at least a review ;)

@@ -14,27 +14,30 @@ pub struct WebPushClient {
impl WebPushClient {
pub fn new() -> WebPushClient {
let mut builder = Client::builder();
builder.keep_alive(true);
builder.pool_max_idle_per_host(std::usize::MAX);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? This is a lot of connections...

Copy link
Author

@tiesselune tiesselune Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because keep_alive was deprecated, and the deprecation message was suggesting using pool_max_idle_per_host (which it does under the hood). usize::MAX is the default value and means "no limit".
If you look at the code for the keep_alive function, setting it to true does exactly that. But if you want to introduce a hard limit, now is the time 😉!

In any case since this is hyper-specific, it is removed in your other pull request.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, this is interesting. I'm just surprised you'd have a limit of 18446744073709551615 connections. You'd first just hit your system ulimit, which is about 1024 or something similar, depending on your setup. I don't think you need to set this anymore, but please check from Hyper what is the correct way of having keep-alive set for http1.

Btw, hyper will be gone if we merge my PR about being runtime-independent, and I'd like to cut a new major version when we release the new ece support and runtime-independence, so we have some time to test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was saying. Probably not worth the time since it's going to be removed. By the way I have the version that works on your fork (a lot of little conflicts on formatting) so since I already resolved them I can push it on another branch or even udpate this PR once the other one is merged somewhere.

if let Some(signature) = &self.vapid_signature {
headers.push((
"Authorization",
format!("vapid t={}, k={}", signature.auth_t, signature.auth_k),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't these in the vapid generation already?

Copy link
Author

@tiesselune tiesselune Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL; DR: AESGCM requires 3 headers: Authorization, Crypto-Key and Encryption, while AES128GCM only needs Authorization, but its content is different (and a lot easier to understand!) so they must be generated separately

Original message : I'm not sure what you mean... Vapid needs this header (public key + JWT) and it's different on AESGCM (which uses dh/ecdsa keys in the Crypto-Key field, and only the token in Authorization with a WebPush prefix). The final RFC puts the dh part back in the Authorization field, under the name k with a vapid prefix, and there is no Crypto-Key field.

Anyway what I mean is the Authorization field is implementation-specific so it's necessarily generated separately, per-implementation, if that's what you meant. This is the part of the code where it was added to the crypto-headers in your previous version (but only the AESGCM version) The VAPID Signature struct only has a function to construct the JWT, which is just one part of the new Authorization header

So maybe we can remove this function from The VapidSignature impl since it's AESGCM specific and not use into here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'm not that much into ece encryption anymore, so I trust your research on it... :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this little clean-up, with the current Into implementation looking like it's not AESGCM-specific when it actually is ;)

src/http_ece.rs Show resolved Hide resolved
@tiesselune tiesselune changed the title WIP : Adding Aes128Gcm support via rust-ece crate delegation Adding Aes128Gcm support via rust-ece crate delegation Feb 9, 2021
@pimeys
Copy link
Owner

pimeys commented Feb 11, 2021

@tiesselune is it now ok to merge? I'm probably not releasing a new version until we get the runtime PR merged too. Then we go one major version up.

@tiesselune
Copy link
Author

I'd like to make one more commit, but since we're going to merge your other PR and those two have a bunch of smallish formatting conflicts, it's maybe better to update it first once your other PR is merged (I did it once to test the infamous stack-overflow hyper/actix problem so the branch is basically ready, but it's on another local branch on my computer).

And moved it in the `generate_headers_aesgcm` function instead
@tiesselune tiesselune marked this pull request as draft February 17, 2021 11:04
@tiesselune
Copy link
Author

Re-converting to draft, just noticed I had some weird test failures I still have to investigate.

This is now tested in `http_ece.rs`
@tiesselune
Copy link
Author

Update: the test failures come from the fact that padding has changed between the previously implemented version and the ece crate version.

Pondering if we should re-introduce padding on both schemes

Note :content-length is not deterministic for AES128GCM
src/error.rs Outdated
@@ -172,7 +172,7 @@ impl Error for WebPushError {

impl fmt::Display for WebPushError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", self.description())
write!(f, "{}", self)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an infinite recursion causing stack overflow. Perhaps the formatting string should be "{:?}" instead?

Copy link
Author

@tiesselune tiesselune Feb 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pimeys we found the culprit of this elusive stack overflow! Thanks @koalefant!

I know what happened there. It was a quick deprecation fix since description() is deprecated on the Error trait, and it suggests to use the Display implementation instead or to_string. I did not really think twice when using the Display trait... inside the Display trait 🙄 . Thanks for pointing that out! 😉 I have just submitted a commit that handles the deprecation warning properly, by actually moving the description() method (that is deprecated, as well as cause()) in the Display implementation as it is suggested in the up-to-date Error official doc. I'll take this for a spin later this week and check that everything works properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants