Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add recovery #696

Merged
merged 5 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions docs/docs/architecture/secrets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Secrets & recovery

When the Coordinator is configured with the initial manifest, it generates a random secret seed.
From this seed, it uses an HKDF to derive the CA root key and a signing key for the manifest history.
This derivation is deterministic, so the seed can be used to bring any Coordinator to this Coordinator's state.

The secret seed is returned to the user on the first call to `contrast set`, encrypted with the user's public seed share owner key.
If no seed share owner key is provided, a key is generated and stored in the working directory.

## Persistence

The Coordinator runs as a `StatefulSet` with a dynamically provisioned persistent volume.
This volume stores the manifest history and the associated runtime policies.
The manifest isn't considered sensitive information, because it needs to be passed to the untrusted infrastructure in order to start workloads.
However, the Coordinator must ensure its integrity and that the persisted data corresponds to the manifests set by authorized users.
Thus, the manifest is stored in plain text, but is signed with a private key derived from the Coordinator's secret seed.

## Recovery

When a Coordinator starts up, it doesn't have access to the signing secret and can thus not verify the integrity of the persisted manifests.
It needs to be provided with the secret seed, from which it can derive the signing key that verifies the signatures.
This procedure is called recovery and is initiated by the workload owner.
The CLI decrypts the secret seed using the private seed share owner key, verifies the Coordinator and sends the seed through the `Recover` method.
The Coordinator recovers its key material and verifies the manifest history signature.
20 changes: 20 additions & 0 deletions docs/docs/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,3 +304,23 @@ Using `openssl`, the certificate of the service can be validated with the `mesh-
```sh
openssl s_client -CAfile verify/mesh-ca.pem -verify_return_error -connect ${frontendIP}:443 < /dev/null
```

## Recover the Coordinator

If the Contrast Coordinator restarts, it enters recovery mode and waits for an operator to provide key material.
For demonstration purposes, you can simulate this scenario by deleting the Coordinator pod.

```sh
kubectl delete pod -l app.kubernetes.io/name=coordinator
```

Kubernetes schedules a new pod, but that pod doesn't have access to the key material the previous pod held in memory and can't issue certificates for workloads yet.
You can confirm this by running `verify` again, or you can restart a workload pod, which should stay in the initialization phase.
However, the secret seed in your working directory is sufficient to recover the coordinator.

```sh
contrast recover -c "${coordinator}:1313"
```

Now that the Coordinator is recovered, all workloads should pass initialization and enter the running state.
You can now verify the Coordinator again, which should return the same manifest you set before.
6 changes: 6 additions & 0 deletions docs/docs/features-limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,9 @@ The policy limitations, in particular the missing guarantee that our service mes
## Tooling integration

- **CLI availability**: The CLI tool is currently only available for Linux. This limitation arises because certain upstream dependencies haven't yet been ported to other platforms.

## Automatic recovery and high availability

The Contrast Coordinator is a singleton and can't be scaled to more than one instance.
When this instance's pod is restarted, for example for node maintenance, it needs to be recovered manually.
In a future release, we plan to support distributed Coordinator instances that can recover automatically.
5 changes: 5 additions & 0 deletions docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,11 @@ const sidebars = {
label: 'Attestation',
id: 'architecture/attestation',
},
{
type: 'doc',
label: 'Secrets & recovery',
id: 'architecture/secrets',
},
{
type: 'doc',
label: 'Certificate authority',
Expand Down