From d6817d8f8adad6baa6caa1803c50102ee071d7b6 Mon Sep 17 00:00:00 2001 From: Markus Rudy Date: Wed, 10 Jul 2024 12:34:18 +0200 Subject: [PATCH] docs: add recovery Co-authored-by: Paul Meyer <49727155+katexochen@users.noreply.github.com> --- docs/docs/architecture/secrets.md | 24 ++++++++++++++++++++++++ docs/docs/deployment.md | 20 ++++++++++++++++++++ docs/docs/features-limitations.md | 6 ++++++ docs/sidebars.js | 5 +++++ 4 files changed, 55 insertions(+) create mode 100644 docs/docs/architecture/secrets.md diff --git a/docs/docs/architecture/secrets.md b/docs/docs/architecture/secrets.md new file mode 100644 index 0000000000..303f792376 --- /dev/null +++ b/docs/docs/architecture/secrets.md @@ -0,0 +1,24 @@ +# Secrets & recovery + +When the Coordinator is configured with the initial manifest, it generates a random secret seed. +From this seed, it uses an HKDF to derive the CA root key and a signing key for the manifest history. +This derivation is deterministic, so the seed can be used to bring any Coordinator to this Coordinator's state. + +The secret seed is returned to the user on the first call to `contrast set`, encrypted with the user's public seed share owner key. +If no seed share owner key is provided, a key is generated and stored in the working directory. + +## Persistence + +The Coordinator runs as a `StatefulSet` with a dynamically provisioned persistent volume. +This volume stores the manifest history and the associated runtime policies. +The manifest isn't considered sensitive information, because it needs to be passed to the untrusted infrastructure in order to start workloads. +However, the Coordinator must ensure its integrity and that the persisted data corresponds to the manifests set by authorized users. +Thus, the manifest is stored in plain text, but is signed with a private key derived from the Coordinator's secret seed. + +## Recovery + +When a Coordinator starts up, it doesn't have access to the signing secret and can thus not verify the integrity of the persisted manifests. +It needs to be provided with the secret seed, from which it can derive the signing key that verifies the signatures. +This procedure is called recovery and is initiated by the workload owner. +The CLI decrypts the secret seed using the private seed share owner key, verifies the Coordinator and sends the seed through the `Recover` method. +The Coordinator recovers its key material and verifies the manifest history signature. diff --git a/docs/docs/deployment.md b/docs/docs/deployment.md index d770e8bdb2..4cd8187fd5 100644 --- a/docs/docs/deployment.md +++ b/docs/docs/deployment.md @@ -304,3 +304,23 @@ Using `openssl`, the certificate of the service can be validated with the `mesh- ```sh openssl s_client -CAfile verify/mesh-ca.pem -verify_return_error -connect ${frontendIP}:443 < /dev/null ``` + +## Recover the Coordinator + +If the Contrast Coordinator restarts, it enters recovery mode and waits for an operator to provide key material. +For demonstration purposes, you can simulate this scenario by deleting the Coordinator pod. + +```sh +kubectl delete pod -l app.kubernetes.io/name=coordinator +``` + +Kubernetes schedules a new pod, but that pod doesn't have access to the key material the previous pod held in memory and can't issue certificates for workloads yet. +You can confirm this by running `verify` again, or you can restart a workload pod, which should stay in the initialization phase. +However, the secret seed in your working directory is sufficient to recover the coordinator. + +```sh +contrast recover -c "${coordinator}:1313" +``` + +Now that the Coordinator is recovered, all workloads should pass initialization and enter the running state. +You can now verify the Coordinator again, which should return the same manifest you set before. diff --git a/docs/docs/features-limitations.md b/docs/docs/features-limitations.md index 6673189ffa..44d874083e 100644 --- a/docs/docs/features-limitations.md +++ b/docs/docs/features-limitations.md @@ -26,3 +26,9 @@ The policy limitations, in particular the missing guarantee that our service mes ## Tooling integration - **CLI availability**: The CLI tool is currently only available for Linux. This limitation arises because certain upstream dependencies haven't yet been ported to other platforms. + +## Automatic recovery and high availability + +The Contrast Coordinator is a singleton and can't be scaled to more than one instance. +When this instance's pod is restarted, for example for node maintenance, it needs to be recovered manually. +In a future release, we plan to support distributed Coordinator instances that can recover automatically. diff --git a/docs/sidebars.js b/docs/sidebars.js index 2ea84977f3..20e003d742 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -125,6 +125,11 @@ const sidebars = { label: 'Attestation', id: 'architecture/attestation', }, + { + type: 'doc', + label: 'Secrets & recovery', + id: 'architecture/secrets', + }, { type: 'doc', label: 'Certificate authority',