This is not meant to be a completely secure implementation (i.e. side channel attacks and the likes are not necessarily accounted for), and is written for two main reasons:
- Complete the AES assignment in the Applied Cryptography (DD2520) course at KTH Royal Institute of Technology
- Get better at programming in Rust
AES is what's known as a Block Cipher, where a plaintext message is divided into blocks of text of a predetermined size. After this, each block is encrypted separately with the key and a so-called initialisation vector, whose value may or may not depend on other blocks (different modes of operation handle this differently).
With a 128-bit key as in this assignment, AES uses ten so-called rounds (essentially iterations, although the last round is slightly different). Each round uses a specific round key, which is generated by running the "main key" through a key expansion algorithm. This algorithm works like so:
- Split the 128-bit key into four 32-bit words: w0,0, w0,1, w0,2, and w0,3; these form the first round key, k0.
- Define g(x) to be a function that operates on its four bytes of input as follows:
- Shift the four bytes left by one, i.e.
[1,2,3,4]
becomes[2,3,4,1]
. - Run the
sub_bytes
algorithm (described later) on each of the four bytes. - XOR the leftmost byte with a round constant denoted by ci, where ci is the element xi-1 of the Galois Field GF(28).
- Shift the four bytes left by one, i.e.
- For keys i = 1..10, we now run the following algorithm:
- wi,0 = wi-1,0 ⊕ gi(wi-1,3)
- wi,1 = wi-1,1 ⊕ wi,0
- wi,2 = wi-1,2 ⊕ wi,1
- wi,3 = wi-1,3 ⊕ wi,2
We now have 11 round keys, one before and after each round, and can proceed to the encryption itself.
For each block of 128 bits, we construct a 4x4 (column-major) matrix where each "cell" contains one byte. This means that the first 8 bits go in the top left cell, the second set of 8 bits goes in the cell below that, etc. The matrix ends up looking like this, where the numbers in each cell represent which bits go where (one "word" is 32 bits):
word 0 | word 1 | word 2 | word 3 |
---|---|---|---|
0-7 | 32-39 | 64-71 | 96-103 |
8-15 | 40-47 | 72-79 | 104-111 |
16-23 | 48-55 | 80-87 | 112-119 |
23-31 | 56-63 | 88-95 | 120-127 |
This step is very straightforward. We use a hard-coded lookup table with 256 (28), entries, go through each cell of the matrix, and substitute the byte in each cell with the corresponding value in the lookup table (e.g. lookup_table[byte]
). This lookup table can be calculated at the start and essentially be a vector of 256 bytes, with the index as key and sbox[idx]
as the value.
The substitution box is calculated like so:
- For each byte, find the multiplicative inverse of it in GF(28). This can be done using brute force, as it's not too costly (255 values to check, max). The zero byte maps to itself, as it lacks an inverse.
- Multiply the resulting byte (i.e. the inverse) by a matrix and add a vector, both which can be found here (and an implementation in the
CalculateF
andCalculateG
functions here).
It can also be found on page 16 of this report.
Here, we perform a so-called cyclic shift on the four rows of the matrix. The first row is left unchanged, the second is shifted once to the left, the third two bytes to the left and the fourth three bytes to the left (or one to the right).
In this step, we multiply our matrix with another, predefined matrix (with the multiplication taking place in GF(28)). Essentially:
ff = finitefield
for each column c:
newcol_c[0] = ff.multiply(matrix[0,0], c[0]) ⊕ ff.multiply(matrix[0,1], c[1]) ⊕ [...]
newcol_c[1] = ff.multiply(matrix[1,0], c[0]) ⊕ ff.multiply(matrix[1,1], c[1]) ⊕ [...]
newcol_c[2] = ff.multiply(matrix[2,0], c[0]) ⊕ ff.multiply(matrix[2,1], c[1]) ⊕ [...]
newcol_c[3] = ff.multiply(matrix[3,0], c[0]) ⊕ ff.multiply(matrix[3,1], c[1]) ⊕ [...]
The multiplication step can be a bit tricky, but it's possible to use this algorithm to perform it (which avoids having to manually deal with division by polynomials).
For the final step in each round, we XOR this current matrix (represented now as 128 bits) with the current round key, i.e. after the first application of sub_bytes
, shift_rows
and mix_columns
, we XOR the output with round key 1 (note that round key 0 is used for XORing with the initial input before passing it into the round permutation).
We apply this permutation in nine rounds, but skip the mix_columns
step for round 10 since it doesn't add any extra secrecy (the diffusion it provides won't propagate to another round anyway). We then, after round 10, XOR the output with round key 10. This is our output.
- A Graduate Course in Applied Cryptography by Dan Boneh and Victor Shoup had some excellent information on the various steps of AES, as well as some optimisations...
- This GitHub repository was fantastic in helping me grasp the Rust language better in terms of byte manipulation and data representation.
- This Wikipedia page was really helpful in providing an efficient method for multiplication in GF(28) without having to "brute force" inverses.
- This report was great for examples and visualising the algorithm.
- A Stick Figure Guide to the Advanced Encryption Standard (AES) and the corresponding GitHub repository did a wonderful job of breaking down the algorithm into understandable steps.
- This report by the authors of AES (Joan Daemen and Vincent Rijmen) explains the algorithm in detail and helped clarify some of the steps. It is quite technical, however, so it wasn't always my first choice.