diff --git a/README.md b/README.md index 0aff5f5ea..1f9c2dc95 100644 --- a/README.md +++ b/README.md @@ -75,7 +75,7 @@ enforcement. * [JSON Processor](#json) * [Custom JSON Processor](#json-custom) * [Jackson ObjectMapper](#json-jackson) -* [Base64 Codec](#base64) +* [Base64 Support](#base64) * [Custom Base64 Codec](#base64-custom) @@ -1327,7 +1327,109 @@ utility classes. `io.jsonwebtoken.io.Decoders`: * `BASE64` is an RFC 4648 [Base64](https://tools.ietf.org/html/rfc4648#section-4) decoder -* `BASE64URL` is an RFC 4648 [Base64URL](https://tools.ietf.org/html/rfc4648#section-5) decoder +* `BASE64URL` is an RFC 4648 [Base64URL](https://tools.ietf.org/html/rfc4648#section-5) decoder + + +### Understanding Base64 in Security Contexts + +All cryptographic operations, like encryption and message digest calculations, result in binary data - raw byte arrays. + +Because raw byte arrays cannot be represented natively in JSON, the JWT +specifications employ the Base64URL encoding scheme to represent these raw byte values in JSON documents or compound +structures like a JWT. + +This means that the Base64 and Base64URL algorithms take a raw byte array and converts the bytes into a string suitable +to use in text documents and protocols like HTTP. These algorithms can also convert these strings back +into the original raw byte arrays for decryption or signature verification as necessary. + +That's nice and convenient - and I hear you saying, "I already know that" - but there are two very important properties of +Base64 (and Base64URL) text strings that are critical to remember when they are used in security scenarios like with JWTs: + +1. Base64 is not encryption +2. Changing Base64 characters does not automatically invalidate data. + + +#### Base64 is not encryption + +**Base64-encoded text is _not_ encrypted.** + +While a byte array representation can be converted to text with the Base64 algorithms, +anyone in the world can take Base64-encoded text, decode it with any standard Base64 decoder, and obtain the +underlying raw byte array data. No key or secret is required to decode Base64 text - anyone can do it. + +Based on this, when encoding sensitive byte data with Base64 - like a shared or private key - **the resulting +string NOT is safe to expose publicly**. + +A base64-encoded key is still sensitive information and must +be kept as secret and as safe as the original thing you got the bytes from (e.g. a Java `PrivateKey` or `SecretKey` +instance). + +After Base64-encoding data into a string, it is possible to then encrypt the string to keep it safe from prying +eyes if desired, but this is different. Encryption is not encoding. They are separate concepts. + + +#### Changing Base64 Characters + +In an effort to see if signatures or encryption is truly validated correctly, some try to edit a JWT +string - particularly the Base64-encoded signature part - to see if the edited string fails security validations. + +This conceptually makes sense: change the signature string, you would assume that signature validation would fail. + +_But this doesn't always work. Changing base64 characters is an invalid test_. + +Why? + +Because of the way the Base64 algorithm works, there are multiple Base64 strings that can represent the same raw byte +array. + +Going into the details of the Base64 algorithm is out of scope for this documentation, but there are many good +Stackoverflow [answers](https://stackoverflow.com/questions/33663113/multiple-strings-base64-decoded-to-same-byte-array?noredirect=1&lq=1) +that explain this in detail. Here's one [good answer](https://stackoverflow.com/questions/29941270/why-do-base64-decode-produce-same-byte-array-for-different-strings): + +> Remember that Base64 encodes each 8 bit entity into 6 bit chars. The resulting string then needs exactly +> 11 * 8 / 6 bytes, or 14 2/3 chars. But you can't write partial characters. Only the first 4 bits (or 2/3 of the +> last char) are significant. The last two bits are not decoded. Thus all of: +> +> dGVzdCBzdHJpbmo +> dGVzdCBzdHJpbmp +> dGVzdCBzdHJpbmq +> dGVzdCBzdHJpbmr +> All decode to the same 11 bytes (116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106). + +As you can see by the above 4 examples, they all decode to the same exact 11 bytes. So just changing one or two +characters at the end of a Base64 string may not work. + +If you're curious about some more of the technical details why, +[Ivan Vyshnevskyi](https://github.com/sainaen), a contributing member of the JJWT community, also kindly explains this +in nice answer in [JJWT Issue #211](https://github.com/jwtk/jjwt/issues/211#issuecomment-283076269) + +##### Adding Invalid Characters + +JJWT's default Base64/Base64URL decoders automatically ignore illegal Base64 characters located in the beginning and +end of an encoded string. Therefore prepending or appending invalid characters like `{` or `]` or similar will also +not fail JJWT's signature checks either. Why? + +Because such edits - whether changing a trailing character or two, or appending invalid characters - do not actually +change the _real_ signature, which in cryptographic contexts, is always a byte array. Instead, tests like these +change a text encoding of the byte array, and as we covered above, they are different things. + +So JJWT 'cares' more about the real byte array and less about its text encoding because that is what actually matters +in cryptographic operations. In this sense, JJWT follows the [Robustness Principle](https://en.wikipedia.org/wiki/Robustness_principle) +in being _slightly_ lenient on what is accepted per the rules of Base64, but if anything in the real underlying +byte array is changed, then yes, JJWT's cryptographic assertions will definitely fail. + +To help understand JJWT's approach, we have to remember why signatures exist. From our documentation above on +[signing JWTs](#jws): + +> * guarantees it was created by someone we know (it is authentic), as well as +> * guarantees that no-one has manipulated or changed it after it was created (its integrity is maintained). + +Just prepending or appending invalid text to try to 'trick' the algorithm doesn't change the integrity of the +underlying claims or signature byte arrays, nor the authenticity of the claims byte array, because those byte +arrays are still obtained intact. + +Please see [JJWT Issue #518](https://github.com/jwtk/jjwt/issues/518) and its referenced issues and links for more +information. ### Custom Base64