Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client agent not starting when auto_encrypt.tls enabled #6398

Closed
luanbon opened this issue Aug 26, 2019 · 11 comments
Closed

Client agent not starting when auto_encrypt.tls enabled #6398

luanbon opened this issue Aug 26, 2019 · 11 comments
Assignees
Labels
theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication

Comments

@luanbon
Copy link

luanbon commented Aug 26, 2019

Overview of the Issue

I am deploying new server cluster on azure, using virtual machine scale set, with 3 server nodes according to the documentation (Hashicorp Learn Guide), cloud auto join with scale set setted, gossip encryption, TLS encryption, everything done! My servers are up and running.

Additionally i am trying to run a client agent with auto_encrypt.tls = true, but i am facing problems.
When the client starts, the following error is being displayed:

Aug 26 20:34:44 consul-ui consul[19188]: ==> Starting Consul agent...
Aug 26 20:34:44 consul-ui consul[19188]: Version: 'v1.5.3'
Aug 26 20:34:44 consul-ui consul[19188]: Node ID: '7585ef50-fba4-4aca-1fd1-30b8561dcab3'
Aug 26 20:34:44 consul-ui consul[19188]: Node name: 'consul-ui'
Aug 26 20:34:44 consul-ui consul[19188]: Datacenter: 'dc1' (Segment: '')
Aug 26 20:34:44 consul-ui consul[19188]: Server: false (Bootstrap: false)
Aug 26 20:34:44 consul-ui consul[19188]: Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: -1, DNS: 8600)
Aug 26 20:34:44 consul-ui consul[19188]: Cluster Addr: 10.1.2.4 (LAN: 8301, WAN: 8302)
Aug 26 20:34:44 consul-ui consul[19188]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: true, Auto-Encrypt-TLS: true
Aug 26 20:34:44 consul-ui consul[19188]: ==> Log data will now stream in as it occurs:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Error starting agent: VerifyIncoming set, and no Cert/Key pair provided!
Aug 26 20:34:44 consul-ui consul[19188]: 2019/08/26 20:34:44 [INFO] agent: Exit code: 1
Aug 26 20:34:44 consul-ui consul[19188]: agent: Exit code: 1
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
Aug 26 20:34:44 consul-ui systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Start request repeated too quickly.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".

Important to note that verify_incoming, verify_outgoing setted to false and ports.http setted to 8500 on client configuration, the client run successfully.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a cluster with 1 client nodes and 3 server nodes
  2. Enable Gossip Encryption and RPC Communication with TLS
  3. Configure each server as bellow, each .hcl configuration is a different file. (consul, server, agent, tls)
  4. Configure client as bellow
  5. View error

Consul info / configuration for both Client and Server

Client Configuration
server = false
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "ommited"
retry_join = ["provider=azure subscription_id=ommited tenant_id=ommited client_id=ommited secret_access_key=ommited resource_group=consul vm_scale_set=consul"]
acl = {
    tokens = {
        agent = "ommited"
    }
}
enable_syslog = true
leave_on_terminate = true
log_level = "INFO"
verify_incoming = true
#verify_outgoing = false
ca_file = "/etc/consul.d/consul-agent-ca.pem"
ports = {
    http = -1
    https = 8501
}
auto_encrypt = {
    tls = true
}
ui = true
client_addr = "0.0.0.0"
enable_script_checks = false
disable_remote_exec = true
Client folder files (/etc/consul.d)

-rw-rw-r--  1 consul     consul     1245 Aug 26 14:55 consul-agent-ca.pem
-rw-r-----  1 consul     consul      785 Aug 26 20:34 consul.hcl
-rw-rw-r--  1 azure-user azure-user  227 Aug 26 15:12 dc1-cli-consul-0-key.pem
-rw-rw-r--  1 azure-user azure-user 1082 Aug 26 15:12 dc1-cli-consul-0.pem

Server Configuration (the same on 3 nodes)
#consul.hcl

datacenter = "dc1"
data_dir = "/datadisks/disk1/consul"
encrypt = "ommited"
retry_join = ["provider=azure subscription_id=ommited tenant_id=ommited client_id=ommited secret_access_key=ommited resource_group=consul vm_scale_set=consul"]
performance {
  raft_multiplier = 1
}

#server.hcl

server = true
bootstrap_expect = 3
log_level = "INFO"


# agent.hcl
acl = {
    enabled = true
    default_policy = "deny"
    enable_token_persistence = true
    tokens = {
        agent = "ommited"
    }
}

# tls.hcl
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
auto_encrypt = {
    allow_tls = true
}
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/dc1-server-consul-0.pem"
key_file = "/etc/consul.d/dc1-server-consul-0-key.pem"
ports = {
    http = -1,
    https = 8501
}

Server folder files (/etc/consul.d)

-rw-r-----  1 consul     consul      169 Aug 23 21:52 agent.hcl
-rw-rw-r--  1 consul     consul     1245 Aug 23 22:45 consul-agent-ca.pem
-rw-r-----  1 consul     consul      407 Aug 23 20:54 consul.hcl
-rw-rw-r--  1 azure-user azure-user  227 Aug 23 23:33 dc1-cli-consul-0-key.pem
-rw-rw-r--  1 azure-user azure-user 1078 Aug 23 23:33 dc1-cli-consul-0.pem
-rw-r-----  1 consul     consul      227 Aug 23 22:47 dc1-server-consul-0-key.pem
-rw-r-----  1 consul     consul     1139 Aug 23 22:47 dc1-server-consul-0.pem
-rw-r-----  1 consul     consul       54 Aug 23 20:53 server.hcl
-rw-r-----  1 consul     consul      313 Aug 26 19:47 tls.hcl

consul info (server)

agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = a42ded47
        version = 1.5.3
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = true
        leader_addr = 10.1.0.4:8300
        server = true
raft:
        applied_index = 31765
        commit_index = 31765
        fsm_pending = 0
        last_contact = 0
        last_log_index = 31765
        last_log_term = 64
        last_snapshot_index = 16825
        last_snapshot_term = 11
        latest_configuration = [{Suffrage:Voter ID:6b29900f-bbc2-95eb-6a17-629d74c5c487 Address:10.1.0.4:8300} {Suffrage:Voter ID:41fb9c98-7695-76b2-bf25-9658fb806ae0 Address:10.1.0.6:8300} {Suffrage:Voter ID:62e57d8a-0d74-21d3-de58-9b35a91a0827 Address:10.1.0.7:8300}]
        latest_configuration_index = 31420
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 64
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 110
        max_procs = 2
        os = linux
        version = go1.12.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 16
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 76
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 38
        members = 3
        query_queue = 0
        query_time = 1

Operating system and Environment details

Azure Virtual Machine Scale Set, Ubuntu 18.04 LTS

Log Fragments

Aug 26 20:34:44 consul-ui consul[19188]: ==> Starting Consul agent...
Aug 26 20:34:44 consul-ui consul[19188]: Version: 'v1.5.3'
Aug 26 20:34:44 consul-ui consul[19188]: Node ID: '7585ef50-fba4-4aca-1fd1-30b8561dcab3'
Aug 26 20:34:44 consul-ui consul[19188]: Node name: 'consul-ui'
Aug 26 20:34:44 consul-ui consul[19188]: Datacenter: 'dc1' (Segment: '')
Aug 26 20:34:44 consul-ui consul[19188]: Server: false (Bootstrap: false)
Aug 26 20:34:44 consul-ui consul[19188]: Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: -1, DNS: 8600)
Aug 26 20:34:44 consul-ui consul[19188]: Cluster Addr: 10.1.2.4 (LAN: 8301, WAN: 8302)
Aug 26 20:34:44 consul-ui consul[19188]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: true, Auto-Encrypt-TLS: true
Aug 26 20:34:44 consul-ui consul[19188]: ==> Log data will now stream in as it occurs:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Error starting agent: VerifyIncoming set, and no Cert/Key pair provided!
Aug 26 20:34:44 consul-ui consul[19188]: 2019/08/26 20:34:44 [INFO] agent: Exit code: 1
Aug 26 20:34:44 consul-ui consul[19188]: agent: Exit code: 1
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
Aug 26 20:34:44 consul-ui systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Start request repeated too quickly.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".

@hanshasselberg hanshasselberg added the type/bug Feature does not function as expected label Aug 27, 2019
@hanshasselberg
Copy link
Member

@luanbon thanks for reporting. I think this is a bug - it should work since you provided an extra CA that could be used to verify the connection. I will look into it.

@hanshasselberg hanshasselberg added the theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication label Aug 27, 2019
@hanshasselberg hanshasselberg added this to the 1.6.1 milestone Aug 27, 2019
@yurii-fisakov
Copy link

I can confirm this. Same issue on the 1.6.0 version

@jrrdev
Copy link

jrrdev commented Sep 7, 2019

Got the same bug in 1.6.0 release.
This seems to be related with verify_incoming in the configuration of the client because at boot the agent doesn't have a certificate (see https://github.com/hashicorp/consul/blob/master/tlsutil/config.go#L329).

A workaround can be to set verify_incoming: false in the client configuration. My configuration is:

  • Client configuration:
{
  "verify_server_hostname": true,
  "ca_file": "/etc/consul.d/consul-agent-ca.pem",
  "ports": {
    "http": -1,
    "https": 8501
  },
  "auto_encrypt": {
    "tls": true
  },
  "connect": {
    "enabled": true,
    "ca_config": {
      "private_key_type": "ec",
      "private_key_bits": 256
    }
  }
}

Note: the connect stanza is a workaround until 1.6.1 is released, see #6391

  • Server configuration:
{
  "verify_incoming": true,
  "verify_outgoing": true,
  "verify_server_hostname": true,
  "ca_file": "/etc/consul.d/consul-agent-ca.pem",
  "cert_file": "/etc/consul.d/dc1-server-consul-0.pem",
  "key_file": "/etc/consul.d/dc1-server-consul-0-key.pem",
  "ports": {
    "http": -1,
    "https": 8501
  },
  "auto_encrypt": {
    "allow_tls": true
  }
}

From my understanding of the encryption doc, there is no point to set "verify_incoming": true in a client's configuration anyway because this check is only performed on servers, right ?
Or maybe it can introduce some vuln if you aren't using ACL for example if someone is trying to call the API on the agent without a valid TLS cert ?

@hanshasselberg
Copy link
Member

From my understanding of the encryption doc, there is no point to set "verify_incoming": true in a client's configuration anyway because this check is only performed on servers, right ?

thats not correct, clients also need verify_incoming for their api endpoints. they are insecure otherwise.

@otto-dev
Copy link

Confirmed in 1.6.1

@mathplusyou
Copy link

@i0rek Can you confirm that this fix has been included in consul enterprise pro v1.6.1? Or point me to someone who can? I'm still receiving the error described by @luanbon . Thanks!

@peimanja
Copy link

peimanja commented Nov 13, 2019

Same issue here in 1.6.1

@hanshasselberg
Copy link
Member

Thanks for the patience everybody. I have made up my mind on how to approach this issue now.

This issues is not a bug, contrary to what I thought before, it is exactly how it is supposed to work. verify_incoming enforces a TLS connection which cannot be established because there is a CA but no cert.

There is a related PR #6489 which configures auto_encrypt certs for listeners on clients as well. This will enable setting up (insecure) HTTPS connections to the client's https endpoint.

The missing piece here is the ability to export auto_encrypt certs which can then be used to query client https endpoints with auto_encrypt certs. Only then it makes sense to enable verify_incoming_https. Because right now it clearly never works, because there is no way to export such cert. Corresponding issue: #6791.

Do you have any thoughts or questions? Would that work for you?

@hanshasselberg hanshasselberg added waiting-reply Waiting on response from Original Poster or another individual in the thread and removed type/bug Feature does not function as expected labels Nov 14, 2019
@stale stale bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Nov 18, 2019
@hanshasselberg hanshasselberg self-assigned this Nov 18, 2019
@hanshasselberg
Copy link
Member

I created another PR for this: #6811 which also has the doc changes you rightfully mentioned.

And I would like to ask everyone to go to #6811 and tell me about your use case for verify_incoming on consul clients, because we were wondering in which cases it is necessary to turn that on. Apart from the reason that it you were told so by the docs.

Thanks!

@hanshasselberg
Copy link
Member

Closing now. Feel free to chime in on #6811 or create a new issue if there is something you would like us to address/consider.

@ghost
Copy link

ghost commented Jan 25, 2020

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.

If you are still experiencing problems, or still have questions, feel free to open a new one 👍.

@ghost ghost locked and limited conversation to collaborators Jan 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants