Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Base64Url in WebEncoders #56959

Merged
merged 3 commits into from
Jul 26, 2024
Merged

Conversation

stephentoub
Copy link
Member

No description provided.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label Used by the dotnet-issue-labeler to label those issues which couldn't be triaged automatically label Jul 24, 2024
@stephentoub stephentoub marked this pull request as ready for review July 24, 2024 04:35
@stephentoub
Copy link
Member Author

cc: @buyaa-n

Copy link

@buyaa-n buyaa-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@gfoidl gfoidl added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed needs-area-label Used by the dotnet-issue-labeler to label those issues which couldn't be triaged automatically labels Jul 24, 2024
@gfoidl
Copy link
Member

gfoidl commented Jul 24, 2024

Should this be taken a step further as outlined in #56426?
(or at least in a follow up PR)

@stephentoub
Copy link
Member Author

Should this be taken a step further as outlined in #56426? (or at least in a follow up PR)

If that's going to be done, it should be done separately from improving the implementation, which will still exist even if the type were obsoleted. And at this point I suggest it's too late to obsolete it for .NET 9 even if we wanted to.

@gfoidl
Copy link
Member

gfoidl commented Jul 24, 2024

too late to obsolete it for .NET 9

Ah, we are already entering the finishing straight for the upcoming release.
So of course, that should be considered after this PR and separate of it.

@stephentoub
Copy link
Member Author

stephentoub commented Jul 24, 2024

I've debugged through why some of the tests are failing. The crux of it is that, despite its name, WebEncoders.Base64UrlDecode actually accepts both Base64 and Base64Url. This seems like an accident, as I don't see any mention of it in comments or documentation, but effectively it works by looking through the input for -_ and replacing them with +/ and then delegating to Convert.FromBase64CharArray, but it's not validating that the input didn't already contain +/. Which means if it was Base64-encoded to begin with, that data will just be passed through to the underlying FromBase64CharArray. And in these tests, there's code producing Base64 rather than Base64Url, e.g.

return (serverComponent.Sequence, Convert.ToBase64String(protectedBytes));

I can see four paths forward then:

  1. Close this PR and leave the WebEncoders implementation as they are.
  2. Update WebEncoders.Base64UrlDecode to search for + and /, and if either exists, use Base64 decoding rather than Base64Url decoding.
  3. Accept a breaking change in what these APIs accept, making them strictly about Base64Url, and fix the tests and possibly other components like those mentioned above to output Base64Url instead of Base64.

I've done (2). @javiercn?

@stephentoub stephentoub added this to the 9.0.0 milestone Jul 24, 2024
@javiercn
Copy link
Member

@stephentoub we should be fine changing the implementation on the server bits to use Base64 to decode always.

As far as I remember we always produce Base64 payloads, the reason being that they get embedded in comments, and the special characters for Base64Url include - which in rare cases produced -- sequences in the comment that broke the generated HTML.

I believe that when we ran into that we switched to Base64 but I can't find the concrete change. I think we should be fine switching to pure Base64 for these serialization/deserialization scenarios.

It's our own format that we internally produce and doesn't have to survive across framework versions.

@stephentoub
Copy link
Member Author

we should be fine changing the implementation on the server bits to use Base64 to decode always.

I'm confused... you want to change the public WebEncoders.Base64UrlDecode methods to only decode Base64?

@javiercn
Copy link
Member

@stephentoub let me get back to you later. I was suggesting changing the method Blazor uses.

@javiercn
Copy link
Member

javiercn commented Jul 25, 2024

@stephentoub looking at this in more depth.

Let's see if I understood correctly.

  • There is now a Base64Url type on the framework (yay!)
  • We are replacing the implementation in WebEncoders (yay!).
  • Our previous implementation worked by transforming Base64Url into Base64 "in place" then decoding the result.
  • Accidentally, it supported Base64 inputs since that's what it transformed to.
  • We are worried of people passing Base64 input to this method (either test or production code).
    • Production code I think would be a mistake, depending on the case (if its input that we control) we should just fix those instances to use Base64Url instead of Base64.
    • If its test code, we should update the test (if it was a mistake on the test).
    • If there's a location where we are using this with input coming from customers data, then that's something to maybe account for (if it makes sense and is possible, at the call site)
    • For any scenario where this gets used as a library, I think we should include an AppCompat switch to re-enable the behavior, but I think in routines like this, it's worth if by default we can get the max perf out of it.

Also, didn't realize that this functionality was added for netstandard2.0 too.

Makes me want to take this change more (with a compat switch) and signal the deprecation for the entire package #56426 on 10 (even in 9, but we would have to chat with some folks about it).

The reason I think I would be ok with the deprecation is that the changes that are needed here are easy (and likely few).

@stephentoub
Copy link
Member Author

There is now a Base64Url type on the framework (yay!)
We are replacing the implementation in WebEncoders (yay!).
Our previous implementation worked by transforming Base64Url into Base64 "in place" then decoding the result.
Accidentally, it supported Base64 inputs since that's what it transformed to.

Yup. (At least I assume it was an accident; it looks like an unintentional side-effect.)

Production code I think would be a mistake, depending on the case (if its input that we control) we should just fix those instances to use Base64Url instead of Base64.

Ok, I'm aware of the one place I linked to:

return (serverComponent.Sequence, Convert.ToBase64String(protectedBytes));

where that Base64 eventually finds its way here. I'm not sure what else would break if that were changed to instead do Base64Url encoding, but we can try.

For any scenario where this gets used as a library, I think we should include an AppCompat switch to re-enable the behavior, but I think in routines like this, it's worth if by default we can get the max perf out of it.

Ok, so the changes you'd want to see to what's currently in this PR:

  • Add a compat switch for opting back in to Base64UrlDecode supporting Base64.
  • If that switch isn't set, don't use the Base64 fallback path, and for pre-.NET 9, add validation that the input isn't Base64.
  • Change any product and test code we can find that's generating data that ends up here from Base64 to Base64Url.

Is that right?

@javiercn
Copy link
Member

javiercn commented Jul 25, 2024

where that Base64 eventually finds its way here. I'm not sure what else would break if that were changed to instead do Base64Url encoding, but we can try.

This one is fine as Base64, I think that's the place where we I'm confused. We should keep the code as is, and if we are using Base64UrlDecode then just update it to the equivalent Base64 method.

Ok, so the changes you'd want to see to what's currently in this PR:

  • Add a compat switch for opting back in to Base64UrlDecode supporting Base64.

Yep, I'll make sure we also do a breaking change announcement.

  • If that switch isn't set, don't use the Base64 fallback path, and for pre-.NET 9, add validation that the input isn't Base64.

Yep, I think it's fine to keep the check for net standard for max backwards compat.

  • Change any product and test code we can find that's generating data that ends up here from Base64 to Base64Url.

We need to do a quick check to make sure we didn't intend Base64 and we were relying on this behavior by mistake. In the components case, the data should be Base64 and I'm pretty sure is decoded as Base64 and not Base64Url, but I'm going to double check.

@stephentoub
Copy link
Member Author

This one is fine as Base64, I think that's the place where we I'm confused. We should keep the code as is, and if we are using Base64UrlDecode then just update it to the equivalent Base64 method.

Base64UrlDecode is used in DataProtectionCommonExtensions.Unprotect:

public static string Unprotect(this IDataProtector protector, string protectedData)
{
ArgumentNullThrowHelper.ThrowIfNull(protector);
ArgumentNullThrowHelper.ThrowIfNull(protectedData);
try
{
byte[] protectedDataAsBytes = WebEncoders.Base64UrlDecode(protectedData);
byte[] plaintextAsBytes = protector.Unprotect(protectedDataAsBytes);
return EncodingUtil.SecureUtf8Encoding.GetString(plaintextAsBytes);
}
catch (Exception ex) when (ex.RequiresHomogenization())
{
// Homogenize exceptions to CryptographicException
throw Error.CryptCommon_GenericError(ex);
}

which is in turn used in tests like these:
https://github.com/dotnet/aspnetcore/blob/829583b4d4112d9f8586e8abe4069ec52568767c/src/Components/Endpoints/test/EndpointHtmlRendererTest.cs
that get fed input very indirectly from that ServerComponentSerializer. What is it you'd recommend doing here?

@javiercn
Copy link
Member

@stephentoub Protect also uses Base64UrlEncode, so we should be fine.

return WebEncoders.Base64UrlEncode(protectedDataAsBytes);

The only case where I could think this could potentially cause problems is if someone used another helper to manually convert the bytes, and then did a Convert.ToBase64 on top, and then passed that output to Unprotect later on, which seems far-fetched, and definitely something we can handle inside Unprotect if we need to. (This is so unlikely that I think I'm ok with us just not accounting for this possibility, worst case, there's a compat switch that can be flipped).

@stephentoub
Copy link
Member Author

stephentoub commented Jul 25, 2024

Protect also uses Base64UrlEncode, so we should be fine.

This code:

var serializedServerComponentBytes = JsonSerializer.SerializeToUtf8Bytes(serverComponent, ServerComponentSerializationSettings.JsonSerializationOptions);
var protectedBytes = _dataProtector.Protect(serializedServerComponentBytes, ServerComponentSerializationSettings.DataExpiration);
return (serverComponent.Sequence, Convert.ToBase64String(protectedBytes));

is not using a Protect that uses Base64UrlEncode. It's calling a different Protect and then passing the output of that to ToBase64String. That Base64 string is finding its way to this Unprotect call, that's doing Base64UrlDecode.

@javiercn
Copy link
Member

Protect also uses Base64UrlEncode, so we should be fine.

This code:

var serializedServerComponentBytes = JsonSerializer.SerializeToUtf8Bytes(serverComponent, ServerComponentSerializationSettings.JsonSerializationOptions);
var protectedBytes = _dataProtector.Protect(serializedServerComponentBytes, ServerComponentSerializationSettings.DataExpiration);
return (serverComponent.Sequence, Convert.ToBase64String(protectedBytes));

is not using a Protect that uses Base64UrlEncode. It's calling a different Protect and then passing the output of that to ToBase64String. That Base64 string is finding its way to this Unprotect call, that's doing Base64UrlDecode.

Ah, I see.

This was always meant to do Base64 decoding, so I think it's safe for us to change it to do so. I imagine what we need to do is Convert.FromBase64 string and call the Unprotect overload that receives bytes.

This was likely accidental on our part and just happened to work precisely because of what you mentioned. I suspect this happened because we tried to optimize the encode path and the decode path just kept working.

It gives me a bit of pause as I think someone else might have done something similar. That said, I still think we should go with the app compat switch and get feedback in RC. Worst case, we can flip the switch to be on by default. What do you think?

@stephentoub
Copy link
Member Author

Thanks. How would you feel about this perf optimization merging and then separately following up with the larger behavioral change? I think it's likely we'll find it needs to be reverted, and I'd like to not have the use of the new Base64Uri methods caught up in that.

@javiercn
Copy link
Member

@stephentoub sounds good.

@stephentoub stephentoub merged commit c5eb354 into dotnet:main Jul 26, 2024
26 checks passed
@stephentoub stephentoub deleted the usebase64url branch July 26, 2024 16:27
@dotnet-policy-service dotnet-policy-service bot modified the milestones: 9.0.0, 9.0-rc1 Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants