Skip to content

Commit

Permalink
ACME Considerations Guide (hashicorp#21225)
Browse files Browse the repository at this point in the history
* Add notes on PKI performance and key types

Signed-off-by: Alexander Scheel <[email protected]>

* Add ACME Public Internet section

Signed-off-by: Alexander Scheel <[email protected]>

* Add note on importance of tidy

Signed-off-by: Alexander Scheel <[email protected]>

* Add note on cluster scalability

Signed-off-by: Alexander Scheel <[email protected]>

* Add note about server log location

Signed-off-by: Alexander Scheel <[email protected]>

* Fix ToC, finish public ACME discussion

Signed-off-by: Alexander Scheel <[email protected]>

* Add note on role restrictions and ACLs

Signed-off-by: Alexander Scheel <[email protected]>

* Add note on security considerations of ACME

Signed-off-by: Alexander Scheel <[email protected]>

* Add consideration note about cluster URLs

Signed-off-by: Alexander Scheel <[email protected]>

* Add note on 90 day certificates

Signed-off-by: Alexander Scheel <[email protected]>

* Add note about client counts and ACME

Signed-off-by: Alexander Scheel <[email protected]>

---------

Signed-off-by: Alexander Scheel <[email protected]>
  • Loading branch information
cipherboy authored Jun 15, 2023
1 parent c5549cd commit e6f3003
Showing 1 changed file with 220 additions and 4 deletions.
224 changes: 220 additions & 4 deletions website/content/docs/secrets/pki/considerations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,19 @@ generating the CA to use with this secrets engine.
- [Managed Keys](#managed-keys)
- [One CA Certificate, One Secrets Engine](#one-ca-certificate-one-secrets-engine)
- [Always Configure a Default Issuer](#always-configure-a-default-issuer)
- [Key Types Matter](#key-types-matter)
- [Key Types Matter](#key-types-matter)
- [Cluster Performance and Key Types](#cluster-performance-and-key-types)
- [Use a CA Hierarchy](#use-a-ca-hierarchy)
- [Cross-Signed Intermediates](#cross-signed-intermediates)
- [Keep certificate lifetimes short, for CRL's sake](#keep-certificate-lifetimes-short-for-crls-sake)
- [Cluster URLs are Important](#cluster-urls-are-important)
- [Automate Rotation with ACME](#automate-rotation-with-acme)
- [ACME Stores Certificates](#acme-stores-certificates)
- [ACME Role Restrictions Require EAB](#acme-role-restrictions-require-eab)
- [ACME and the Public Internet](#acme-and-the-public-internet)
- [ACME Errors are in Server Logs](#acme-errors-are-in-server-logs)
- [ACME Security Considerations](#acme-security-considerations)
- [ACME and Client Counting](#acme-and-client-counting)
- [Keep Certificate Lifetimes Short, For CRL's Sake](#keep-certificate-lifetimes-short-for-crls-sake)
- [NotAfter Behavior on Leaf Certificates](#notafter-behavior-on-leaf-certificates)
- [Cluster Performance and Quantity of Leaf Certificates](#cluster-performance-and-quantity-of-leaf-certificates)
- [You must configure issuing/CRL/OCSP information _in advance_](#you-must-configure-issuingcrlocsp-information-_in-advance_)
Expand Down Expand Up @@ -120,7 +129,7 @@ issuer's CRL. This means maintaining a default issuer is important for both
backwards compatibility for issuing certificates and for ensuring revoked
certificates land on a CRL.

### Key Types Matter
## Key Types Matter

Certain key types have impacts on performance. Signing certificates from a RSA
key will be slower than issuing from an ECDSA or Ed25519 key. Key generation
Expand All @@ -135,6 +144,60 @@ also be more expensive. Careful consideration of both issuer and issued key
types can have meaningful impacts on performance of not only Vault, but
systems using these certificates.

### Cluster Performance and Key Type

The [benchmark-vault](https://github.com/hashicorp/vault-benchmark) project
can be used to measure the performance of a Vault PKI instance. In general,
some considerations to be aware of:

- RSA key generation is much slower and highly variable than EC key
generation. If performance and throughput are a necessity, consider using
EC keys (including NIST P-curves and Ed25519) instead of RSA.

- Key signing requests (via `/pki/sign`) will be faster than (`/pki/issue`),
especially for RSA keys: this removes the necessity for Vault to generate
key material and can sign the key material provided by the client. This
signing step is common between both endpoints, so key generation is pure
overhead if the client has a sufficiently secure source of entropy.

- The CA's key type matters as well: using a RSA CA will result in a RSA
signature and takes longer than a ECDSA or Ed25519 CA.

- Storage is an important factor: with [BYOC Revocation](/vault/api-docs/secret/pki#revoke-certificate),
using `no_store=true` still gives you the ability to revoke certificates
and audit logs can be used to track issuance. Clusters using a remote
storage (like Consul) over a slow network and using `no_store=false` will
result in additional latency on issuance. Adding leases for every issued
certificate compounds the problem.

- Storing too many certificates results in longer `LIST /pki/certs` time,
including the time to tidy the instance. As such, for large scale
deployments (>= 250k active certificates) it is recommended to use audit
logs to track certificates outside of Vault.

As a general comparison on unspecified hardware, using `benchmark-vault` for
`30s` on a local, single node, raft-backed Vault instance:

- Vault can issue 300k certificates using EC P-256 for CA & leaf keys and
without storage.

- But switching to storing these leaves drops that number to 65k, and only
20k with leases.

- Using large, expensive RSA-4096 bit keys, Vault can only issue 160 leaves,
regardless of whether or not storage or leases were used. The 95% key
generation time is above 10s.

- In comparison, using P-521 keys, Vault can issue closer to 30k leaves
without leases and 18k with leases.

These numbers are for example only, to represent the impact different key types
can have on PKI cluster performance.

The use of ACME adds additional latency into these numbers, both because
certificates need to be stored and because challenge validation needs to
be performed.

## Use a CA Hierarchy

It is generally recommended to use a hierarchical CA setup, with a root
Expand Down Expand Up @@ -176,7 +239,160 @@ can be constructed in the following order:
All requests to this issuer for signing will now present the full cross-signed
chain.

## Keep certificate lifetimes short, for CRL's sake
## Cluster URLs are Important

In Vault 1.13, support for [templated AIA
URLs](/vault/api-docs/secret/pki#enable_aia_url_templating-1)
was added. With the [per-cluster URL
configuration](/vault/api-docs/secret/pki#set-cluster-configuration) pointing
to this Performance Replication cluster, AIA information will point to the
cluster that issued this certificate automatically.

In Vault 1.14, with ACME support, the same configuration is used for allowing
ACME clients to discover the URL of this cluster.

~> **Warning**: It is important to ensure that this configuration is
up to date and maintained correctly, always pointing to the node's
PR cluster address (which may be a Load Balanced or a DNS Round-Robbin
address). If this configuration is not set on every Performance Replication
cluster, certificate issuance (via REST and/or via ACME) will fail.

## Automate Rotation with ACME

In Vault 1.14, support for the [Automatic Certificate Management Environment
(ACME)](https://datatracker.ietf.org/doc/html/rfc8555) protocol has been
added to the PKI Engine. This is a standardized way to handle validation,
issuance, rotation, and revocation of server certificates.

Many ecosystems, from web servers like Caddy, Nginx, and Apache, to
orchestration environments like Kubernetes (via cert-manager) natively
support issuance via the ACME protocol. For deployments without native
support, stand-alone tools like certbot support fetching and renewing
certificates on behalf of consumers. Vault's PKI Engine only includes server
support for ACME; no client functionality has been included.

~> Note: Vault's PKI ACME server caps the certificate's validity at 90 days
maximum, regardless of role and/or global limits. Shorter validity
durations can be set via limiting the role's TTL to be under 90 days.
Aligning with Let's Encrypt, we do not support the optional `NotBefore`
and `NotAfter` order request parameters.

### ACME Stores Certificates

Because ACME requires stored certificates in order to function, the notes
[below about automating tidy](#automate-crl-building-and-tidying) are
especially important for the long-term health of the PKI cluster. ACME also
introduces additional resource types (accounts, orders, authorizations, and
challenges) that must be tidied via [the `tidy_acme=true`
option](/vault/api-docs/secret/pki#tidy). Orders, authorizations, and
challenges are [cleaned up based on the
`safety_buffer`](/vault/api-docs/secret/pki#safety_buffer)
parameter, but accounts can live longer past their last issued certificate
by controlling the [`acme_account_safety_buffer`
parameter](/vault/api-docs/secret/pki#acme_account_safety_buffer).

As a consequence of the above, and like the discussions in the [Cluster
Scalability](#cluster-scalability) section, because these roles have
`no_store=false` set, ACME can only issue certificates on the active nodes
of PR clusters; standby nodes, if contacted, will transparently forward
all requests to the active node.

### ACME Role Restrictions Require EAB

Because ACME by default has no external authorization engine and is
unauthenticated from a Vault perspective, the use of roles with ACME
in the default configuration are of limited value as any ACME client
can request certificates under any role by proving possession of the
requested certificate identifiers.

To solve this issue, there are two possible approaches:

1. Use a restrictive [`allowed_roles`, `allowed_issuers`, and
`default_directory_policy` ACME
configuration](/vault/api-docs/secret/pki#set-acme-configuration)
to let only a single role and issuer be used. This prevents user
choice, allowing some global restrictions to be placed on issuance
and avoids requiring ACME clients to have (at initial setup) access
to a Vault token other mechanism for acquiring a Vault EAB ACME token.
2. Use a more permissive [configuration with
`eab_policy=always-required`](/vault/api-docs/secret/pki#eab_policy)
to allow more roles and users to select the roles, but bind ACME clients
to a Vault token which can be suitably ACL'd to particular sets of
approved ACME directories.

The choice of approach depends on the policies of the organization wishing
to use ACME.

### ACME and the Public Internet

Using ACME is possible over the public internet; public CAs like Let's Encrypt
offer this as a service. Similarly, organizations running internal PKI
infrastructure might wish to issue server certificates to pieces of
infrastructure outside of their internal network boundaries, from a publicly
accessible Vault instance. By default, without enforcing a restrictive
`eab_policy`, this results in a complicated threat model: _any_ external
client which can prove possession of a domain can issue a certificate under
this CA, which might be considered more trusted by this organization.

As such, we strongly recommend publicly facing Vault instances (such as HCP
Vault) enforce that PKI mount operators have required a [restrictive
`eab_policy=always-required` configuration](/vault/api-docs/secret/pki#eab_policy).
System administrators of Vault instances can enforce this by [setting the
`VAULT_DISABLE_PUBLIC_ACME=true` environment
variable](/vault/api-docs/secret/pki#acme-external-account-bindings).

### ACME Errors are in Server Logs

Because the ACME client is not necessarily trusted (as account registration
may not be tied to a valid Vault token when EAB is not used), many error
messages end up in the Vault server logs out of security necessity. When
troubleshooting issues with clients requesting certificates, first check
the client's logs, if any, (e.g., certbot will state the log location on
errors), and then correlate with Vault server logs to identify the failure
reason.

### ACME Security Considerations

ACME allows any client to use Vault to make some sort of external call;
while the design of ACME attempts to minimize this scope and will prohibit
issuance if incorrect servers are contacted, it cannot account for all
possible remote server implementations. Vault's ACME server makes three
types of requests:

1. DNS requests for `_acme-challenge.<domain>`, which should be least
invasive and most safe.
2. TLS ALPN requests for the `acme-tls/1` protocol, which should be
safely handled by the TLS before any application code is invoked.
3. HTTP requests to `http://<domain>/.well-known/acme-challenge/<token>`,
which could be problematic based on server design; if all requests,
regardless of path, are treated the same and assumed to be trusted,
this could result in Vault being used to make (invalid) requests.
Ideally, any such server implementations should be updated to ignore
such ACME validation requests or to block access originating from Vault
to this service.

In all cases, no information about the response presented by the remote
server is returned to the ACME client.

When running Vault on multiple networks, note that Vault's ACME server
places no restrictions on requesting client/destination identifier
validations paths; a client could use a HTTP challenge to force Vault to
reach out to a server on a network it could otherwise not access.

### ACME and Client Counting

In Vault 1.14, ACME contributes differently to usage metrics than other
interactions with the PKI Secrets Engine. Due to its use of unauthenticated
requests (which do not generate Vault tokens), it would not be counted in
the traditional [activity log APIs](/vault/api-docs/system/internal-counters#activity-export).
Instead, certificates issued via ACME will be counted via their unique
certificate identifiers (the combination of CN, DNS SANs, and IP SANs).
These will create a stable identifier that will be consistent across
renewals, other ACME clients, mounts, and namespaces, contributing to
the activity log presently as a non-entity token attributed to the first
mount which created that request.

## Keep Certificate Lifetimes Short, For CRL's Sake

This secrets engine aligns with Vault's philosophy of short-lived secrets. As
such it is not expected that CRLs will grow large; the only place a private key
Expand Down

0 comments on commit e6f3003

Please sign in to comment.