-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store keyring in Raft #23665
Comments
For #23665 I'm about to make add a lot more code to the state store for the keyring, so I'd like to pull these out to their own file. Also updates the test to use `shoenig/test` and changes the name of one method to be a little more accurate.
For #23665 I'm about to make add a lot more code to the state store for the keyring, so I'd like to pull these out to their own file. Also updates the test to use `shoenig/test` and changes the name of one method to be a little more accurate.
I don't think this is required. The benefits are just too hard to realize. Security sensitive users must use a KMS. Just offering this option adds a non-trivial amount of cognitive overhead to operators and security teams to find the right solution for their use case, and that effort is better put toward using a KMS. |
While working on #23655 I found there were a few places in the encrypter/keyring where we could make modest improvements to performance and reliability of the existing code. This changeset allows keyring replication to skip trying to replicate from itself, switches some of the read-only keyring accesses to use the read lock instead of a r/w lock, fixes the logging configuration to drop spurious "extra value" warnings in the logs, drops an unused type, and makes a minor refactoring to eliminate shadowing of the `keyset` type. Pulling this out to its own PR lets us backport these changes to the LTS and reduces the size of the PR that implements #23665. Ref #23665
While working on #23655 I found there were a few places in the encrypter/keyring where we could make modest improvements to performance and reliability of the existing code. This changeset allows keyring replication to skip trying to replicate from itself, switches some of the read-only keyring accesses to use the read lock instead of a r/w lock, fixes the logging configuration to drop spurious "extra value" warnings in the logs, drops an unused type, and makes a minor refactoring to eliminate shadowing of the `keyset` type. Pulling this out to its own PR lets us backport these changes to the LTS and reduces the size of the PR that implements #23665. Ref #23665
In Nomad 1.4, we implemented a root keyring to support encrypting Variables and signing Workload Identities. The keyring was originally stored with the AEAD-wrapped DEKs and the KEK together in a JSON keystore file on disk. We recently added support for using an external KMS for the KEK to improve the security model for the keyring. But we've encountered multiple instances of the keystore files not getting backed up separately from the Raft snapshot, resulting in failure to restore clusters from backup. Move Nomad's root keyring into Raft (encrypted with a KMS/Vault where available) in order to eliminate operational problems with the separate on-disk keystore. Fixes: #23665 Ref: https://hashicorp.atlassian.net/browse/NET-10523
#23977 has merged. I'll have a separate PR for docs and CLI tools |
Background
With the introduction of Variables in Nomad 1.4, Nomad has had an internal keyring consisting of:
The KEK is stored on disk distinctly from Raft logs and snapshots so that secret material may not be extracted from leaked snapshots.
As of #23580 Nomad will support using an external KMS for the KEK. However the wrapped (encrypted) DEK and WI keys will still be in the on disk keyring outside of snapshots.
The on disk keyring, even when using a KMS, poses an unfortunate challenge for Nomad operators: they must backup the keyring like snapshots, but distinctly. If a single script backs up both snapshots and keyring to the same location, the only remaining benefit to keeping the two distinct is in the case of sharing snapshots with a third party (eg HashiCorp) for debugging.
Nomad engineering has encountered multiple outages caused by users unaware that they need to backup and restore the keyring distinct but alongside the snapshot.
Proposal
The downsides of storing the keyring separate from snapshots outweighs the potential security benefits. The security benefits are likely rarely realized due to the difficulty of splitting backups across multiple processes and locations. This user experience should be fixed by doing the following:
In order to support storing key material separate from snapshots, the following features should be added:
operator snapshot save
CLI should gain a parameter that allows generating snapshots without the KEK included (for users without a KMS). Users with a KMS configured will never receive their KEK in a snapshot.operator snapshot redact
command should remove the KEK from a snapshot if one is present. This command could optionally store the extracted key to a file for backing up distinctly from the snapshot.operator snapshot inspect
should warn users if a snapshot contains a KEK.Attempted Solutions
The Nomad team has attempted to document proper keyring handling, but has encountered multiple instances of operators being unaware of how to properly handle keyrings during backups and restores.
The text was updated successfully, but these errors were encountered: