Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul connect: Consul agent fails to renew leaf certificate #5239

Closed
thanapolr opened this issue Jan 21, 2019 · 6 comments
Closed

Consul connect: Consul agent fails to renew leaf certificate #5239

thanapolr opened this issue Jan 21, 2019 · 6 comments
Assignees
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies

Comments

@thanapolr
Copy link

thanapolr commented Jan 21, 2019

Reproducing Steps:

  • Create 3 consul 1.4.0 servers, 3 vault 1.0.2 servers, 2 consul 1.4.0 agents using vagrant
  • enable consul connect on server using consul CA or vault CA

-- configuration for using consul CA

connect {
    enabled = true
}
  • setup consul agent-1 with the following configuration to register service with sidecar proxy and upstreams
services = [
    {
        id = "dummy1"
        name = "dummy1"
        port = 8080
        connect {
            sidecar_service {
                proxy {
                    upstreams = [
                        {
                            destination_name = "dummy2"
                            local_bind_port = 10002
                        }
                    ]
                }
            }
        }
    }
]
  • setup consul agent-2 with the following configuration to register service with sidecar proxy and upstreams
services = [
    {
        id = "dummy2"
        name = "dummy2"
        port = 8080
        connect {
            sidecar_service {
                proxy {
                    upstreams = [
                        {
                            destination_name = "dummy1"
                            local_bind_port = 10001
                        }
                    ]
                }
            }
        }
    }
]
  • Both agents have acl token with service write policy since we use master token(the issue exists even with other tokens with service write policy) and is configured as following :
{
  "primary_datacenter": "local",
  "acl": {
    "enabled": true,
    "tokens": {
      "default": "{{ consul-master-token }}"
    }
  }
}
  • When the consul agent-1 and agent-2 are started, they receive valid leaf certificates and can communicate through proxies using mTLS.
  • Now if we change the date on the agent-1 to make the certificate expire, or if you just let it expire eventually, then it couldn't get/renew a new valid leaf certificate and the services couldn't communicate anymore. The following is the error in the logs:
[ERR] http: Request GET /v1/agent/connect/ca/leaf/dummy1?index=11992410, error: No known Consul servers from=127.0.0.1:43086
@thanapolr
Copy link
Author

thanapolr commented Jan 21, 2019

The error occurs even I use Vault CA.

configuration for using Vault CA

connect {
    enabled = true
    ca_provider = "vault"
    ca_config {
        address = "http://vault-1:8200"
        token = "[vault_token]"
        root_pki_path = "pki"
        intermediate_pki_path = "pki_int"
    }
}
# Enable secrets engine
path "sys/mounts/*" {
  capabilities = [ "create", "read", "update", "delete", "list" ]
}

# List enabled secrets engine
path "sys/mounts" {
  capabilities = [ "read", "list" ]
}

# Work with pki secrets engine
path "pki*" {
  capabilities = [ "create", "read", "update", "delete", "list", "sudo" ]
}
  • Create vault role for consul connect
vault write pki_int/roles/leaf-cert allow_subdomain=true allowed_domains=testhost.local key_type=ec key_bits=224 require_cn=false use_csr_sans=false ttl=1h max_ttl=1h
  • The error is slightly different
[ERR] http: Request GET /v1/agent/connect/ca/leaf/dummy1?index=43912, error: rpc error making call: error issuing cert: read tcp 10.0.2.15:34498->172.20.160.12:8200: read: connection reset by peer from=127.0.0.1:46900

@rboyer rboyer added the theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies label Jan 29, 2019
@elocnatsirt
Copy link

I am seeing this same error using the Vault CA and Connect Injector for K8s. After ~3.65 days, services are no longer able to communicate with each other and require a full restart/new injection to function as a Connect service.

Is there a fix for this?

@dclfan
Copy link

dclfan commented Mar 26, 2019

I am seeing this same error using the Vault CA and Connect Injector for K8s. After ~3.65 days, services are no longer able to communicate with each other and require a full restart/new injection to function as a Connect service.

Is there a fix for this?

I had been having this issue with Connect and Consul CA since initial release. 1.4.2 has fixed it for me. Have been running around the clock since install ~ 1 month. Prior I had never been longer than a week.

@elocnatsirt
Copy link

I had been having this issue with Connect and Consul CA since initial release. 1.4.2 has fixed it for me. Have been running around the clock since install ~ 1 month. Prior I had never been longer than a week.

After upgrading the servers to 1.4.4, so far everything seems to be running smoothly and I am not getting the errors after a few days using the Vault CA. Will report back if I notice any issues, but upgrading seemed to have fixed this for me.

@hanshasselberg
Copy link
Member

Thank you everybody for reporting and updating this issue. Is anybody still experiencing this issue with Consul >= 1.4.2?

@kyhavlov kyhavlov self-assigned this May 16, 2019
@kyhavlov
Copy link
Contributor

After investigating this it seems like this issue was fixed by #4480, which was one of a few different fixes around leaf certificates in 1.4.1. There were a couple other subtle issues that were possible to hit as well that could exacerbate the situation seen here as well. I'm going to close this issue as this problem is fixed by newer Consul versions, but if anyone still sees this behavior on a newer version (>= 1.4.2) it'd be worth opening a new issue since that would probably have a different underlying cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies
Projects
None yet
Development

No branches or pull requests

6 participants