Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL and filename safe base64 encoding #31099

Closed
am11 opened this issue Oct 8, 2019 · 14 comments
Closed

URL and filename safe base64 encoding #31099

am11 opened this issue Oct 8, 2019 · 14 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@am11
Copy link
Member

am11 commented Oct 8, 2019

Summary

The requirement of URL and filename safe alphabets in base64 strings is common in many applications that deal with resources over the network, not limited to web applications. Existing base64 encoding APIs do not provide option to encode in safe alphabets, forcing consumer to manually replace characters before and after the decoding and encoding operations respectively. Therefore, extending the existing base64 encoding API set in System.Convert to support safe alphabets would be quite useful in terms of completeness.

Rationale and Usage

In order to achieve Base64Url encoding, one of the common technique is to call System.Convert.ToBase64String on the input, then truncate (or percentage-encode) the trailing = characters, and finally replace + characters with - and / with _. For example, the JSON Web Signature (JWS) RFC 7571 gives a C# example with similar code. This approach is also suggested in the top answer on SO. However, this approach is rendered inefficient under a high load, when compared to the spanified implementation internally used in OpenSslX509ChainProcessor: https://github.com/dotnet/corefx/blob/e70e76159b3f34e4e35d241daf39d4f57f4bd82c/src/System.Security.Cryptography.X509Certificates/src/Internal/Cryptography/Pal.Unix/OpenSslX509ChainProcessor.cs#L544

or in WebEncoders from AspNetCore: https://github.com/aspnet/AspNetCore/blob/fd060ce8c36ffe195b9e9a69a1bbd8fb53cc6d7c/src/Shared/WebEncoders/WebEncoders.cs#L347

A unified efficient implementation, that conforms with RFC 4648 - Section 5 would prevent consumers from looking for it elsewhere. For reference, this API was included in Java 8 (in 2014).

Proposed API

namespace System
{
    public static class Convert
    {
        // existing
        public static byte[] FromBase64CharArray(char[] inArray, int offset, int length);
        public static byte[] FromBase64String(string s);
        public static string ToBase64CharArray(Byte[], Int32, Int32, Char[], Int32);
        public static string ToBase64CharArray(Byte[], Int32, Int32, Char[], Int32, Base64FormattingOptions);
        public static string ToBase64String(Byte[], Int32, Int32, Base64FormattingOptions) 	 
        public static string ToBase64String(Byte[], Int32, Int32) 	
        public static string ToBase64String(Byte[], Base64FormattingOptions) 	
        public static string ToBase64String(Byte[]) 	
        public static string ToBase64String(ReadOnlySpan<Byte>, Base64FormattingOptions)
        public static bool TryFromBase64Chars(ReadOnlySpan<char> chars, Span<byte> bytes,
            out int bytesWritten);
        public static bool TryFromBase64String(string s, Span<byte> bytes,
            out int bytesWritten);
        public static bool TryToBase64Chars(ReadOnlySpan<byte> bytes, Span<char> chars,
            out int charsWritten, Base64FormattingOptions options = Base64FormattingOptions.None);

        // proposed
+       public static byte[] FromBase64UrlCharArray(char[] inArray, int offset, int length);
+       public static byte[] FromBase64UrlString(string s);
+       public static string ToBase64UrlCharArray(Byte[], Int32, Int32, Char[], Int32);
+       public static string ToBase64UrlCharArray(Byte[], Int32, Int32, Char[], Int32, Base64FormattingOptions);
+       public static string ToBase64UrlString(Byte[], Int32, Int32, Base64FormattingOptions) 	 
+       public static string ToBase64UrlString(Byte[], Int32, Int32) 	
+       public static string ToBase64UrlString(Byte[], Base64FormattingOptions) 	
+       public static string ToBase64UrlString(Byte[]) 	
+       public static string ToBase64UrlString(ReadOnlySpan<Byte>, Base64FormattingOptions)
+       public static bool TryFromBase64UrlChars(ReadOnlySpan<char> chars, Span<byte> bytes,
+           out int bytesWritten);
+       public static bool TryFromBase64UrlString(string s, Span<byte> bytes,
+           out int bytesWritten);
+       public static bool TryToBase64UrlChars(ReadOnlySpan<byte> bytes, Span<char> chars,
+           out int charsWritten, Base64FormattingOptions options = Base64FormattingOptions.None);

    }
}

Details

  • The reason for *Base64Url* names is because RFC 4648 explicitly calls it out. Base64Url encoded string is not a valid base64 string due to non-base64 (-, _ and %) characters.
    • If that was not the case, UrlAndPathSafeAlphabets = 2, could had been considered in enum Base64FormattingOptions for existing {To,From}Base64{CharArray,String} APIs.
  • ToBase64Url.. methods are conformant to RFC 4648. If the input length is known (which is the case here) and there is a pad characters = , it will be truncated (as opposed to percentage encoded).
  • FromBase64Url.. methods accept all cases of RFC 4648 and two non-RFC ones, %2B and %2F mentioned in the case below:
    • OpenSslX509ChainProcessor.Base64UrlEncode is not conforming to RFC as it percentage-encodes + (%2B) and / (%2F) characters, whereas RFC only calls out pad character = to be optionally percentage-encoded and which can be dropped if length is known.
    • ASP.NET Core's WebEncoders.Base64UrlDecode is also not conforming to RFC 4648. For example, the input TestString can be encoded as any of the following, but Base64UrlDecode only recognizes first two and throws FormatException for the third:
      • VGVzdFN0cmluZw
      • VGVzdFN0cmluZw==
      • VGVzdFN0cmluZw%3D%3D

Open Questions

  • ToBase64Url.. methods are conformant to RFC 4648. If the input length is known (which is the case here) and there is a pad characters = , it will be truncated (as opposed to percentage encoded).

Are there cases where consumer explicitly do not want to omit the pad character and expect the API to percentage-encode it? For example, going by the usage of OpenSslX509ChainProcessor.Base64UrlEncode, it is not clear whether it will break something if this method simply omits = and replace + and / with - and _ respectively.

@bartonjs
Copy link
Member

bartonjs commented Oct 8, 2019

FWIW, the X.509 OCSP isn't "encode base64url", it's "urlencode base64", per https://tools.ietf.org/html/rfc6960#appendix-A.

@scalablecory
Copy link
Contributor

@dotnet/ncl

@scalablecory
Copy link
Contributor

ASP.NET already has this implemented, so the question to me is how much value does this bring to non-ASP.NET apps?

@gfoidl
Copy link
Member

gfoidl commented Oct 8, 2019

If this should go into .NET these apis should be added to System.Buffers.Text.Base64 or to a new System.Buffers.Text.Base64Url.


Side note to

Rationale and Usage ... WebEncoders

Disclaimer: I'm the author of https://github.com/gfoidl/Base64, which allows base64Url encoding / decoding (in the meaning of ASP.NET Core's usage) in a direct way (and not via replacements).
This project started as residue of my work on WebEncoders.

@am11
Copy link
Member Author

am11 commented Oct 24, 2019

ASP.NET already has this implemented, so the question to me is how much value does this bring to non-ASP.NET apps?

I don't have numbers but here are some supporting points:

  • not all .NET applications that deal with resources over network depend on ASP.NET
  • to have RFC 4648 complaint implementation in one place, rather than divided in two packages
  • at least the standard libraries of Java, Python, Ruby and Go provide url-and-filename-safe Base64 encoding

If this should go into .NET these apis should be added to System.Buffers.Text.Base64 or to a new System.Buffers.Text.Base64Url.

+1. I did not knew about this new Base64 class added in .NET Core 3.0 (and .NET Standard 2.1).

@gfoidl
Copy link
Member

gfoidl commented Oct 24, 2019

Aside:

new Base64 class added in .NET Core 3.0

It got added in .NET Core 2.1.

@h82258652
Copy link

maybe base62 is a better solution.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@joperezr joperezr added this to the Future milestone Jul 7, 2020
@galakt
Copy link
Contributor

galakt commented Aug 29, 2022

@maryamariyan @joperezr Any updates? Do you need help or something?

@joperezr
Copy link
Member

I'm no longer the owner of this area. ping @dotnet/area-system-runtime, any updates here?

@tannergooding
Copy link
Member

tannergooding commented Aug 29, 2022

I expect this one should actually be owned and driven by the networking team since it impacts Url compatible encodings. CC. @karelz

Convert itself is in System.Runtime, but the domain expertise around this is definitely in networking.

@karelz
Copy link
Member

karelz commented Aug 29, 2022

@MihaZupan any thoughts?

@MihaZupan
Copy link
Member

@galakt what is your use case for this encoding? Are you using ASP.NET and the existing WebEncoders.Base64Url helpers are insufficient (if so, how)? Where does your input come from and what are you doing with the output (it's all strings / you're working with buffers of UTF8 bytes etc.)?

From a networking perspective, I can say that this kind of encoding looks useful (even if no protocol that the runtime itself currently implements uses it).
But a case has to be made that usage of this encoding is common in enough performance-critical scenarios to warrant including it alongside Base64, especially given that an implementation with great performance and all the efficient overloads already exists: https://github.com/gfoidl/Base64.

The main rationale given for adding it is performance, but the most performant helpers (Base64.EncodeToUtf8, Base64.DecodeFromUtf8) are not proposed. Are all the use cases dealing with further UTF16 processing in the same process?

As Jeremy points out, the proposal as currently written is confusing "base64url" with "url-encoded base64". These are two different encodings and these helpers shouldn't attempt to accommodate both.

ASP.NET Core's WebEncoders.Base64UrlDecode is also not conforming to RFC 4648. For example, the input TestString can be encoded as any of the following, but Base64UrlDecode only recognizes first two and throws FormatException for the third:
VGVzdFN0cmluZw
VGVzdFN0cmluZw==
VGVzdFN0cmluZw%3D%3D

Does any language that added such helpers produce anything but the first (and recommended) form?

@galakt
Copy link
Contributor

galakt commented Sep 1, 2022

@MihaZupan We are working heavily with web tokens. We do not use base64Url directly, but via methods based on https://github.com/AzureAD/azure-activedirectory-identitymodel-extensions-for-dotnet/blob/dev/src/Microsoft.IdentityModel.Tokens/Base64UrlEncoder.cs
There are several Encode\ Decode calls per one token.

I am not voting for this particular proposal, but just curious do you want to introduce base64Url encoding\decoding for parity with classic base64. Tricky question because there is already implemented WebEncoders.

@MihaZupan
Copy link
Member

@galakt have you considered using the NuGet package mentioned above if the performance is important in your scenario?
Note that calling code would have to be changed in order to get most of the benefits (working with buffers instead of just strings).

Unless we see more general interest in these APIs, I don't think we should be implementing them in core libraries, given that high-perf scenarios are already supported by the ecosystem (thanks @gfoidl!).

@am11 am11 closed this as completed Sep 3, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests