Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect: Non-blocking query to Leaf Cert endpoint can return an expired certificate #9862

Closed
JWT95 opened this issue Mar 10, 2021 · 2 comments
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/bug Feature does not function as expected type/docs Documentation needs to be created/updated/clarified

Comments

@JWT95
Copy link

JWT95 commented Mar 10, 2021

Overview of the Issue

When using non-blocking queries on the /agent/connect/ca/leaf/:service endpoint, Consul can return an out of date certificate. This directly contradicts the docs at https://www.consul.io/api-docs/agent/connect#service-leaf-certificate

The resulting certificate is cached and returned by this API until it is near expiry or the root certificates change.

Reproduction Steps

2 node cluster with 1 client node and 1 server node in Connect mode. Both agents running Consul 1.9.3.

Set the Connect leaf_cert_ttl to be 1h.

Get a cert for service "leaf-cert" on the client node

$ date && curl http://127.0.0.1:8500/v1/agent/connect/ca/leaf/leaf-cert | jq .
Wed Mar 10 12:31:30 UTC 2021
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1384  100  1384    0     0  38444      0 --:--:-- --:--:-- --:--:-- 38444
{
  "SerialNumber": "08",
  "CertPEM": "...",
  "PrivateKeyPEM": "...",
  "Service": "leaf-cert",
  "ServiceURI": "spiffe://a22a7789-e9fd-1720-9494-9994cad63968.consul/ns/default/dc/dc1/svc/leaf-cert",
  "ValidAfter": "2021-03-10T12:30:30Z",
  "ValidBefore": "2021-03-10T13:30:30Z",
  "CreateIndex": 21,
  "ModifyIndex": 21
}

Wait an hour, hit the same endpoint again. The same, now out of date, certificate is returned (look at the ValidBefore field)

$ date && curl http://127.0.0.1:8500/v1/agent/connect/ca/leaf/leaf-cert | jq .
Wed Mar 10 13:33:55 UTC 2021
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1384  100  1384    0     0  1351k      0 --:--:-- --:--:-- --:--:-- 1351k
{
  "SerialNumber": "08",
  "CertPEM": "...",
  "PrivateKeyPEM": "...",
  "Service": "leaf-cert",
  "ServiceURI": "spiffe://a22a7789-e9fd-1720-9494-9994cad63968.consul/ns/default/dc/dc1/svc/leaf-cert",
  "ValidAfter": "2021-03-10T12:30:30Z",
  "ValidBefore": "2021-03-10T13:30:30Z",
  "CreateIndex": 21,
  "ModifyIndex": 21
}

Presumably this is a caching issue. Using a blocking query will get an up to date cert as will subsequent non-blocking queries after the blocking query.

$ date && curl http://127.0.0.1:8500/v1/agent/connect/ca/leaf/leaf-cert?index=21 | jq .
Wed Mar 10 13:35:14 UTC 2021
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1382  100  1382    0     0  26075      0 --:--:-- --:--:-- --:--:-- 26576
{
  "SerialNumber": "09",
  "CertPEM": "...",
  "PrivateKeyPEM": "...",
  "Service": "leaf-cert",
  "ServiceURI": "spiffe://a22a7789-e9fd-1720-9494-9994cad63968.consul/ns/default/dc/dc1/svc/leaf-cert",
  "ValidAfter": "2021-03-10T13:34:14Z",
  "ValidBefore": "2021-03-10T14:34:14Z",
  "CreateIndex": 393,
  "ModifyIndex": 393
}

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = f55da930
        version = 1.9.3
consul:
        acl = disabled
        known_servers = 1
        server = false
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 49
        max_procs = 2
        os = linux
        version = go1.15.6
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 2
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 2
        members = 2
        query_queue = 0
        query_time = 1
Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = f55da930
        version = 1.9.3
consul:
        acl = disabled
        bootstrap = true
        known_datacenters = 1
        leader = true
        leader_addr = 10.0.2.4:8300
        server = true
raft:
        applied_index = 410
        commit_index = 410
        fsm_pending = 0
        last_contact = 0
        last_log_index = 410
        last_log_term = 2
        last_snapshot_index = 0
        last_snapshot_term = 0
        latest_configuration = [{Suffrage:Voter ID:147b2c71-1883-5e59-e10a-51d0c4e38735 Address:10.0.2.4:8300}]
        latest_configuration_index = 0
        num_peers = 0
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 2
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 97
        max_procs = 2
        os = linux
        version = go1.15.6
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 2
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 2
        members = 2
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Log Fragments

Attached but there appears to be nothing interesting in them.

client_logs.txt
server_logs.txt

@blake blake added theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies labels Mar 10, 2021
@jsosulska jsosulska added the type/bug Feature does not function as expected label Mar 10, 2021
@mikemorris mikemorris added the type/docs Documentation needs to be created/updated/clarified label Mar 22, 2021
@dnephin
Copy link
Contributor

dnephin commented Nov 1, 2021

I think this might be the same underlying issue as #10871. If the cache is not updated, that would explain why we are getting an expired cert.

However note that the goroutine that renews certs may not run often enough for a 1 hour TTL. I haven't confirmed if that is the case or not yet. So we may need to either adjust that, or document/validate that the leaf TTL is at least 2h+

@acpana
Copy link
Contributor

acpana commented Dec 3, 2021

This is fixed as of #11693. New release 1.11 and latest 1.10, 1.9 should have the fix -- all, please feel free to follow up as needed! 🎉

@acpana acpana closed this as completed Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/bug Feature does not function as expected type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

No branches or pull requests

6 participants