Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to use many services #146

Closed
chen23 opened this issue Nov 29, 2020 · 7 comments · Fixed by #151
Closed

unable to use many services #146

chen23 opened this issue Nov 29, 2020 · 7 comments · Fixed by #151
Labels
bug Something isn't working

Comments

@chen23
Copy link

chen23 commented Nov 29, 2020

Describe the bug

When configuring 100 services the client does not work reliably

Versions

Consul Terraform Sync

 0.1.0-techpreview1 (2862363)
Compatible with Terraform ~>0.13.0

Consul Version

Consul 1.8.6

Terraform Version

Terraform v0.13.5

Configuration File(s)

Reminder to redact any sensitive information that may be present in this file

Click to toggle contents of config file
driver "terraform" {
  log = true
}
consul {
  address = "10.1.20.51:8500"
}
log_level = "trace"
task {
  name = "ManyApps"
  description = "Testing many apps"
  source = "../../manyapps"
  services = ["app001","app002","app003","app004","app005","app006","app007","app008","app009","app010","app011","app012","app013","app014","app015","app016","app017","app018","app019","app020","app021","app022","app023","app024","app025","app026","app027","app028","app029","app030","app031","app032","app033","app034","app035","app036","app037","app038","app039","app040","app041","app042","app043","app044","app045","app046","app047","app048","app049","app050","app051","app052","app053","app054","app055","app056","app057","app058","app059","app060","app061","app062","app063","app064","app065","app066","app067","app068","app069","app070","app071","app072","app073","app074","app075","app076","app077","app078","app079","app080","app081","app082","app083","app084","app085","app086","app087","app088","app089","app090","app091","app092","app093","app094","app095","app096","app097","app098","app099","app100"]
}

Terraform Configuration Files Generated by Consul-Terraform-Sync

Reminder to redact any sensitive information that may be present in the files

Click to toggle contents of main.tf
locals {

  # Create a map of service names to instance IDs
  service_ids = transpose({
    for id, s in var.services : id => [s.name]
  })

  # Group service instances by name
  grouped = { for name, ids in local.service_ids :
    name => [
      for id in ids : var.services[id]
    ]
  }

}

resource "local_file" "appoutput" {
  for_each = local.grouped
  content = jsonencode(each.value)
  filename = "/tmp/${each.key}.json"
}
Click to toggle contents of sample service
[
  {
    "Node": {
      "ID": "",
      "Node": "192.168.128.1",
      "Address": "192.168.128.1",
      "Datacenter": "dc1",
      "TaggedAddresses": null,
      "Meta": {
        "external-node": "true",
        "external-probe": "false"
      },
      "CreateIndex": 458821,
      "ModifyIndex": 458821
    },
    "Service": {
      "ID": "dc1-Production-app001-1",
      "Service": "app001",
      "Tags": [],
      "Address": "",
      "Meta": {},
      "Port": 443,
      "Weights": {
        "Passing": 1,
        "Warning": 1
      },
      "EnableTagOverride": false,
      "Proxy": {
        "MeshGateway": {},
        "Expose": {}
      },
      "Connect": {},
      "CreateIndex": 488242,
      "ModifyIndex": 488242
    },
    "Checks": [
      {
        "Node": "192.168.128.1",
        "CheckID": "http-check",
        "Name": "http-check",
        "Status": "passing",
        "Notes": "",
        "Output": "HTTP GET https://192.168.128.1: 200 OK Output: ",
        "ServiceID": "",
        "ServiceName": "",
        "ServiceTags": [],
        "Type": "",
        "Definition": {
          "Interval": "30s",
          "HTTP": "https://192.168.128.1",
          "TLSSkipVerify": true,
          "Header": {
            "Host": [
              "app001.example.com"
            ],
            "x-monitored-by": [
              "eric"
            ]
          },
          "Method": "HEAD"
        },
        "CreateIndex": 442435,
        "ModifyIndex": 491595
      }
    ]
  },
  {
    "Node": {
      "ID": "",
      "Node": "192.168.160.1",
      "Address": "192.168.160.1",
      "Datacenter": "dc1",
      "TaggedAddresses": null,
      "Meta": {
        "external-node": "true",
        "external-probe": "false"
      },
      "CreateIndex": 463773,
      "ModifyIndex": 463773
    },
    "Service": {
      "ID": "dc1-Production-app001-2",
      "Service": "app001",
      "Tags": [],
      "Address": "",
      "Meta": {},
      "Port": 8443,
      "Weights": {
        "Passing": 1,
        "Warning": 1
      },
      "EnableTagOverride": false,
      "Proxy": {
        "MeshGateway": {},
        "Expose": {}
      },
      "Connect": {},
      "CreateIndex": 488243,
      "ModifyIndex": 488243
    },
    "Checks": [
      {
        "Node": "192.168.160.1",
        "CheckID": "http-check",
        "Name": "http-check",
        "Status": "passing",
        "Notes": "",
        "Output": "HTTP GET https://192.168.160.1: 200 OK Output: ",
        "ServiceID": "",
        "ServiceName": "",
        "ServiceTags": [],
        "Type": "",
        "Definition": {
          "Interval": "30s",
          "HTTP": "https://192.168.160.1",
          "TLSSkipVerify": true,
          "Header": {
            "Host": [
              "app001.example.com"
            ],
            "x-monitored-by": [
              "eric"
            ]
          },
          "Method": "HEAD"
        },
        "CreateIndex": 463773,
        "ModifyIndex": 491647
      }
    ]
  },
  {
    "Node": {
      "ID": "",
      "Node": "192.168.176.1",
      "Address": "192.168.176.1",
      "Datacenter": "dc1",
      "TaggedAddresses": null,
      "Meta": {
        "external-node": "true",
        "external-probe": "false"
      },
      "CreateIndex": 463774,
      "ModifyIndex": 463774
    },
    "Service": {
      "ID": "dc1-Production-app001-3",
      "Service": "app001",
      "Tags": [],
      "Address": "",
      "Meta": {},
      "Port": 9443,
      "Weights": {
        "Passing": 1,
        "Warning": 1
      },
      "EnableTagOverride": false,
      "Proxy": {
        "MeshGateway": {},
        "Expose": {}
      },
      "Connect": {},
      "CreateIndex": 488244,
      "ModifyIndex": 488244
    },
    "Checks": [
      {
        "Node": "192.168.176.1",
        "CheckID": "http-check",
        "Name": "http-check",
        "Status": "passing",
        "Notes": "",
        "Output": "HTTP GET https://192.168.176.1: 200 OK Output: ",
        "ServiceID": "",
        "ServiceName": "",
        "ServiceTags": [],
        "Type": "",
        "Definition": {
          "Interval": "30s",
          "HTTP": "https://192.168.176.1",
          "TLSSkipVerify": true,
          "Header": {
            "Host": [
              "app001.example.com"
            ],
            "x-monitored-by": [
              "eric"
            ]
          },
          "Method": "HEAD"
        },
        "CreateIndex": 463774,
        "ModifyIndex": 491886
      }
    ]
  }
]
### Terraform Module If using a private or local Terraform module, share relevant parts of your module here.

see above

Task Variable Files

If passing in task variable file(s), share relevant parts of your variable file(s) here.

n/a

Expected Behavior

Ability to run with 100 services

Actual Behavior

After restarting the client you are unable to run the client

Steps to Reproduce

  1. create a directory "nia"
  2. in the "nia" directory create a config.hcl with the config above
  3. in the "nia" directory create a directory "manyapps" and place the "main.tf" from above
  4. generate 100 services

Additional Context

Add any other context about the problem here.

running the command

~/consul-terraform-sync -config-file manyapps.hcl -once true 
...
2020/11/29 12:15:03.045172 [INFO] (ctrl) executing all tasks once through
2020/11/29 12:15:03.045176 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2020/11/29 12:15:03.162235 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2020/11/29 12:15:03.191518 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2020/11/29 12:15:03.220197 [DEBUG] (ctrl) change detected for task ManyApps
2020/11/29 12:15:03.220593 [TRACE] (ctrl) template for task "ManyApps" rendered: {DidRender:false WouldRender:true}
2020/11/29 12:15:03.220608 [INFO] (ctrl) executing task ManyApps
2020/11/29 12:15:03.220612 [TRACE] (driver.terraform) initializing workspace 'ManyApps'
2020/11/29 12:15:03.507295 [INFO] running Terraform command: /home/ubuntu/nia/terraform init -no-color -force-copy -input=false -lock-timeout=0s -backend=true -get=true -get-plugins=true -lock=true -upgrade=false -verify-plugins=true
Initializing modules...

Initializing the backend...

Error: Failed to get existing workspaces: Get "http://10.1.20.51:8500/v1/kv/consul-terraform-sync/terraform-env:?keys=&separator=%2F": read tcp 10.1.20.50:49892->10.1.20.51:8500: read: connection reset by peer


2020/11/29 12:15:03.554208 [ERR] (driver.terraform) error initializing workspace, skipping apply for 'ManyApps'
2020/11/29 12:15:03.554261 [ERR] (cli) error running controller in Once mode: could not apply changes for task ManyApps: error tf-init for 'ManyApps': attempt #1 failed '
Error: Failed to get existing workspaces: Get "http://10.1.20.51:8500/v1/kv/consul-terraform-sync/terraform-env:?keys=&separator=%2F": read tcp 10.1.20.50:49892->10.1.20.51:8500: read: connection reset by peer
2020/11/29 12:15:25.539187 [INFO] 0.1.0-techpreview1 (2862363)
2020/11/29 12:15:25.539270 [DEBUG] &Config{LogLevel:trace, InspectMode:false, Syslog:&SyslogConfig{Enabled:false, Facility:LOCAL0, Name:}, Consul:&ConsulConfig{Address:10.1.20.51:8500, Auth:&AuthConfig{Enabled:false, Username:, Password:}, KVNamespace:, KVPath:consul-terraform-sync/, TLS:&TLSConfig{CACert:, CAPath:, Cert:, Enabled:false, Key:, ServerName:, Verify:true}, Token:, Transport:&TransportConfig{DialKeepAlive:30s, DialTimeout:30s, DisableKeepAlives:false, MaxIdleConnsPerHost:3, TLSHandshakeTimeout:10s}}, Driver:&DriverConfig{Terraform:&TerraformConfig{Log:true, PersistLog:false, Path:/home/ubuntu/nia, WorkingDir:/home/ubuntu/nia/sync-tasks, Backend:map[consul:map[address:10.1.20.51:8500 gzip:true path:consul-terraform-sync/terraform]], RequiredProviders:map[]}}, Tasks:{&TaskConfig{Name:ManyApps, Description:Testing many apps, Providers:[], Services:[app001 app002 app003 app004 app005 app006 app007 app008 app009 app010 app011 app012 app013 app014 app015 app016 app017 app018 app019 app020 app021 app022 app023 app024 app025 app026 app027 app028 app029 app030 app031 app032 app033 app034 app035 app036 app037 app038 app039 app040 app041 app042 app043 app044 app045 app046 app047 app048 app049 app050 app051 app052 app053 app054 app055 app056 app057 app058 app059 app060 app061 app062 app063 app064 app065 app066 app067 app068 app069 app070 app071 app072 app073 app074 app075 app076 app077 app078 app079 app080 app081 app082 app083 app084 app085 app086 app087 app088 app089 app090 app091 app092 app093 app094 app095 app096 app097 app098 app099 app100], Source:../../manyapps, VarFiles:[], Version:, BufferPeriod:&BufferPeriodConfig{Enabled:false, Min:5s, Max:20s}}}, Services:{}, Providers:{}, BufferPeriod:&BufferPeriodConfig{Enabled:true, Min:5s, Max:20s}}
2020/11/29 12:15:25.539284 [INFO] (cli) setting up controller: readwrite
2020/11/29 12:15:25.539289 [INFO] (ctrl) setting up Terraform driver
2020/11/29 12:15:25.539292 [INFO] (ctrl) retrieved 0 Terraform handlers
2020/11/29 12:15:25.540377 [ERR] (cli) error setting up controller: Get "http://10.1.20.51:8500/v1/status/leader": EOF

@chen23 chen23 added the bug Something isn't working label Nov 29, 2020
@findkim
Copy link
Contributor

findkim commented Dec 1, 2020

Hi @chen23 thanks for submitting a bug report. This behavior is not ideal. I was able to reproduce both errors you included: the first one errors while initializing the task (exits), then on subsequent restarts of Consul Terraform Sync it errors receiving EOF when querying for Consul leadership.

My setup uses the config file manyapps.hcl you provided, a task with 100 services, and registered those fake services to Consul with a bash script.

#!/bin/bash

for i in $(seq -f "%03g" 1 100)
do
	echo $i
	curl -X PUT localhost:8500/v1/agent/service/register \
	--data '{ "name": "app'"$i"'", "id": "app'"$i"'" }'
done

The errors look related to DDoS tickets on Consul using Consul Template hashicorp/consul#7259. And changing the Consul agent configuration limit { http_max_conns_per_client = 400 } (docs) does seem to change the behavior of Consul Terraform Sync. A current but not ideal work-around for release 0.1.0-techpreview1 is to set http_max_conns_per_client = 0 on the agent to disable the limit.

However this doesn't seem to resolve the underlying problem. It seems that there are connections held open preventing new HTTP calls to the Consul agent. This was my finding from the first pass of evaluating this bug. My next efforts will look into why there are open connections lingering.

@chen23
Copy link
Author

chen23 commented Dec 2, 2020

@findkim thank you for confirming. I was able to workaround the issue by setting http_max_conns_per_client = 0.

@findkim
Copy link
Contributor

findkim commented Dec 8, 2020

Oops my PR summary quoted "doesn't fix 146" and github probably parsed it to mean "fix 146" and automatically closed this out. Reopening!

@findkim
Copy link
Contributor

findkim commented Dec 11, 2020

I dug a bit more into this and wanted to share some of my findings and conclusions.

Agent connection limits

The underlying monitoring logic uses TCP connections for long polling of service changes using Consul blocking queries. The way Consul-Terraform-Sync uses this mechanism ends up exploiting the design to effectively have 1 service to 1 TCP connection with the agent, hence quickly reaching the agent limits. Since blocking queries are used, the load of these connections aren't high so I don't have as much reservation on increasing the limit amount as I initially did. I added documentation around this that will be included as a part of the next release hashicorp/consul#9371

I haven't looked much into the Consul 1.9 streaming feature and how it could improve this use case but would be a direction to look into regarding fewer open connections.

EOF / 429 errors from Consul when restarting CTS

Depending on the version of Consul running, you may observe an EOF or 429 error resulting from the agent limiting CTS long-running TCP connections. This was very peculiar because the TCP connections should have properly closed out on CTS shutdown.

A few connection cleanups are in place #151 and hashicorp/hcat#31 but I still am observing ghost TCP connections stuck after CTS has shutdown.

$ netstat -anp TCP | grep 8500
tcp4       0      0  127.0.0.1.8500         127.0.0.1.49567        CLOSE_WAIT 
tcp4       0      0  127.0.0.1.8500         127.0.0.1.49566        CLOSE_WAIT 
tcp4       0      0  127.0.0.1.8500         127.0.0.1.49565        CLOSE_WAIT 
tcp4       0      0  127.0.0.1.49567        127.0.0.1.8500         FIN_WAIT_2 
tcp4       0      0  127.0.0.1.49566        127.0.0.1.8500         FIN_WAIT_2 
tcp4       0      0  127.0.0.1.49565        127.0.0.1.8500         FIN_WAIT_2 
…

And Consul agent still processes those blocking query requests and holds onto TCP connections after the client closes. About 5 minutes later I see Consul logs completing the requests, which follows suit with the blocking query default max wait of 5 minutes.

2020-12-01T18:09:35.402-0600 [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/app088?index=365&passing=1 from=127.0.0.1:65047 latency=5m3.53462249s
2020-12-01T18:09:35.420-0600 [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/app029?index=306&passing=1 from=127.0.0.1:65081 latency=5m3.547135907s
2020-12-01T18:09:35.587-0600 [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/app039?index=316&passing=1 from=127.0.0.1:65019 latency=5m3.721325843s

Seeing that the client side is stuck on FIN_WAIT_2 leads me to believe this potential Consul bug is in play hashicorp/consul#8524, where Consul API server does not send the final ACK on TCP connection closure or the server doesn't acknowledge clients cancelling blocking queries.

So unfortunately, I think restarting CTS may run into agent limits with a high number of services until ^ is resolved. In this case the agent could be restarted to release those connections or wait ~5 mins for the server to finally reap those connections before starting CTS up again.

@findkim
Copy link
Contributor

findkim commented Jan 11, 2021

I've been wracking my mind around why 100 services requires 200 connections.

I just realized my repro steps with the bash script using curl to register services keeps TCP connections alive for a minute. And so each registration through curl took up 100 connections from the default 200 quota on the Consul server from the host. And so for me, running a task with 100 services always hit the limit when the curl connections were still open. I was not able to reproduce 429 errors from Consul when taking into account the curl requests I had just made (on first run, restart errors still occur since that is caused by a different bug).

@chen23 do you recall how you had registered 100 services with Consul? Were they registered on from the same host within a short period before executing Consul-Terraform-Sync?

@chen23
Copy link
Author

chen23 commented Jan 12, 2021

@findkim I just retried my test and still getting similar result. in this case I only had 3/100 services defined and it managed to throw an error. I'm still running an older version of consul 1.8.6. In my test environment I have 2 servers

$ consul members
Node   Address          Status  Type    Build  Protocol  DC   Segment
node1  10.1.20.50:8301  alive   server  1.8.6  2         dc1  <all>
node2  10.1.20.51:8301  alive   server  1.8.6  2         dc1  <all>

here's the errors that I see

$ consul-terraform-sync  -version
2021/01/12 21:44:06 [DEBUG] (cli) version flag was given, exiting now
 0.1.0-techpreview1 (2862363)
Compatible with Terraform ~>0.13.0
...
$ consul-terraform-sync  -version
2021/01/12 21:43:09 [DEBUG] (cli) version flag was given, exiting now
 0.1.0-techpreview1 (2862363)
Compatible with Terraform ~>0.13.0
$ consul-terraform-sync -config-file manyapps.hcl -once true
2021/01/12 21:43:20.078754 [INFO] 0.1.0-techpreview1 (2862363)
2021/01/12 21:43:20.078836 [DEBUG] &Config{LogLevel:trace, InspectMode:false, Syslog:&SyslogConfig{Enabled:false, Facility:LOCAL0, Name:}, Consul:&ConsulConfig{Address:10.1.20.50:8500, Auth:&AuthConfig{Enabled:f
alse, Username:, Password:}, KVNamespace:, KVPath:consul-terraform-sync/, TLS:&TLSConfig{CACert:, CAPath:, Cert:, Enabled:false, Key:, ServerName:, Verify:true}, Token:, Transport:&TransportConfig{DialKeepAlive:
30s, DialTimeout:30s, DisableKeepAlives:false, MaxIdleConnsPerHost:3, TLSHandshakeTimeout:10s}}, Driver:&DriverConfig{Terraform:&TerraformConfig{Log:true, PersistLog:false, Path:/home/ubuntu/nia, WorkingDir:/hom
e/ubuntu/nia/sync-tasks, Backend:map[consul:map[address:10.1.20.50:8500 gzip:true path:consul-terraform-sync/terraform]], RequiredProviders:map[]}}, Tasks:{&TaskConfig{Name:ManyApps, Description:Testing many app
s, Providers:[], Services:[app001 app002 app003 app004 app005 app006 app007 app008 app009 app010 app011 app012 app013 app014 app015 app016 app017 app018 app019 app020 app021 app022 app023 app024 app025 app026 ap
p027 app028 app029 app030 app031 app032 app033 app034 app035 app036 app037 app038 app039 app040 app041 app042 app043 app044 app045 app046 app047 app048 app049 app050 app051 app052 app053 app054 app055 app056 app
057 app058 app059 app060 app061 app062 app063 app064 app065 app066 app067 app068 app069 app070 app071 app072 app073 app074 app075 app076 app077 app078 app079 app080 app081 app082 app083 app084 app085 app086 app0
87 app088 app089 app090 app091 app092 app093 app094 app095 app096 app097 app098 app099 app100], Source:../../manyapps, VarFiles:[], Version:, BufferPeriod:&BufferPeriodConfig{Enabled:false, Min:5s, Max:20s}}}, S
ervices:{}, Providers:{}, BufferPeriod:&BufferPeriodConfig{Enabled:true, Min:5s, Max:20s}}
2021/01/12 21:43:20.078849 [INFO] (cli) setting up controller: readwrite
2021/01/12 21:43:20.078855 [INFO] (ctrl) setting up Terraform driver
2021/01/12 21:43:20.078858 [INFO] (ctrl) retrieved 0 Terraform handlers
2021/01/12 21:43:20.157447 [INFO] (cli) initializing controller
2021/01/12 21:43:20.157524 [INFO] (ctrl) initializing driver
2021/01/12 21:43:20.157529 [INFO] (ctrl) initializing all tasks
2021/01/12 21:43:31.448773 [INFO] (driver.terraform) skipping install, terraform 0.13.5 already exists at path /home/ubuntu/nia/terraform
2021/01/12 21:43:31.448795 [DEBUG] (ctrl) initializing task "ManyApps"
2021/01/12 21:43:31.449722 [DEBUG] (templates.tftmpl) creating main.tf in root module for task "ManyApps": /home/ubuntu/nia/sync-tasks/ManyApps/main.tf
2021/01/12 21:43:31.454140 [DEBUG] (templates.tftmpl) creating variables.tf in root module for task "ManyApps": /home/ubuntu/nia/sync-tasks/ManyApps/variables.tf
2021/01/12 21:43:31.457674 [DEBUG] (templates.tftmpl) creating terraform.tfvars.tmpl in root module for task "ManyApps": /home/ubuntu/nia/sync-tasks/ManyApps/terraform.tfvars.tmpl
2021/01/12 21:43:31.462109 [TRACE] (task) creating terraform cli client for task 'ManyApps'
2021/01/12 21:43:31.462131 [INFO] (client.terraformcli) Terraform logging is set, Terraform logs will output with Sync logs
2021/01/12 21:43:31.462145 [TRACE] (client.terraformcli) created Terraform CLI client &TerraformCLI{WorkingDir:/home/ubuntu/nia/sync-tasks/ManyApps, WorkSpace:ManyApps, VarFiles:[]}
2021/01/12 21:43:31.462499 [INFO] (ctrl) driver initialized
2021/01/12 21:43:31.462511 [INFO] (cli) running controller in Once mode
2021/01/12 21:43:31.462518 [INFO] (ctrl) executing all tasks once through
2021/01/12 21:43:31.462528 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2021/01/12 21:43:31.589843 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2021/01/12 21:43:31.626437 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2021/01/12 21:43:31.651346 [TRACE] (ctrl) checking dependencies changes for task ManyApps
2021/01/12 21:43:31.670508 [DEBUG] (ctrl) change detected for task ManyApps
2021/01/12 21:43:31.673204 [TRACE] (ctrl) template for task "ManyApps" rendered: {DidRender:true WouldRender:true}
2021/01/12 21:43:31.673219 [INFO] (ctrl) executing task ManyApps
2021/01/12 21:43:31.673223 [TRACE] (driver.terraform) initializing workspace 'ManyApps'
2021/01/12 21:43:31.781451 [INFO] running Terraform command: /home/ubuntu/nia/terraform init -no-color -force-copy -input=false -lock-timeout=0s -backend=true -get=true -get-plugins=true -lock=true -upgrade=fals
e -verify-plugins=true
Initializing modules...

...
module.ManyApps.local_file.appoutput["app002"]: Creating...
module.ManyApps.local_file.appoutput["app001"]: Creating...
module.ManyApps.local_file.appoutput["app001"]: Creation complete after 0s [id=5d6c24e7e4b00efc7f98d4bb0aa4fa01c03c68ce]
module.ManyApps.local_file.appoutput["app003"]: Creating...
module.ManyApps.local_file.appoutput["app002"]: Creation complete after 0s [id=18b0a07c0475b0c42d07ad7947b7135e04af893b]
module.ManyApps.local_file.appoutput["app003"]: Creation complete after 0s [id=08f9f7f8da9f3db295ff2f6e5bd931345c82f62b]

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
2021/01/12 21:43:34.238690 [INFO] (ctrl) task completed ManyApps
2021/01/12 21:43:34.238706 [INFO] (ctrl) all tasks completed once
2021/01/12 21:43:34.238721 [INFO] (cli) controller in Once mode has completed
2021/01/12 21:43:34.238740 [INFO] (cli) graceful shutdown

...
2021/01/12 21:43:41.031918 [TRACE] (driver.terraform) initializing workspace 'ManyApps'
2021/01/12 21:43:41.111135 [INFO] running Terraform command: /home/ubuntu/nia/terraform init -no-color -force-copy -input=false -lock-timeout=0s -backend=true -get=true -get-plugins=true -lock=true -upgrade=false -verify-plugins=true
Initializing modules...

Initializing the backend...

Error: Failed to get existing workspaces: Get "http://10.1.20.50:8500/v1/kv/consul-terraform-sync/terraform-env:?keys=&separator=%2F": EOF


2021/01/12 21:43:41.153810 [ERR] (driver.terraform) error initializing workspace, skipping apply for 'ManyApps'
2021/01/12 21:43:41.153863 [ERR] (cli) error running controller in Once mode: could not apply changes for task ManyApps: error tf-init for 'ManyApps': attempt #1 failed '
Error: Failed to get existing workspaces: Get "http://10.1.20.50:8500/v1/kv/consul-terraform-sync/terraform-env:?keys=&separator=%2F": EOF
...

@findkim
Copy link
Contributor

findkim commented Feb 24, 2021

Hi @chen23 thank you for your patience and appreciate your interest in pushing CTS to run at scale. I wanted to update you that we have a solution in place that will be a part of the next release 0.1.0-beta.

We have identified an area where we could improve the efficiency of the TCP connections that Consul Terraform Sync establishes with the local Consul agent. There is now support for CTS to use HTTP/2 to make multiple blocking query requests on the same connection (hashicorp/hcat#37, #207). Running CTS with this new option will no longer require 1:1 TCP connection to # of services it monitors.

There are now 2 options for running CTS with a large number of services.

  1. Increasing or disabling the Consul agent http_max_conns_per_client as mentioned earlier in this ticket.
  2. Configure CTS to communicate with Consul over HTTP/2

The potential bug that's affecting option 1 hashicorp/consul#8524 is likely still in play but CTS at scale is no longer blocked on a fix for it. And so we would suggest operators to configure CTS to use HTTP/2. I'm going to go ahead and close this out. Please re-open if you find that the changes in the master branch or our upcoming release does not resolve your issue.

Enable HTTP/2

There are a few steps that are required to enable HTTP/2.

  1. The local Consul agent needs to have TLS and HTTPS enabled
    ports {
      // enable TLS for the HTTP API by assining a port number > 0.
      // 8501 is the recommended port for HTTPS.
      https = 8501
    }
    
    // example TLS configuration for agent certs
    cert_file = "consul-agent-ca.pem"
    key_file = "consul-agent-ca-key.pem"
  2. Configure Consul Terraform Sync to use HTTP/2
    consul {
      address = "localhost:8501" // configured HTTPS port for the local Consul agent
    
      tls {
        enabled = true
        verify = false
        // you can optionally set verify = true if you want to also configure certs for secure communication.
        // https://consul.io/docs/nia/configuration#tls
      }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants