Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent net/http: TLS handshake timeout error when downloading providers #16448

Closed
brikis98 opened this issue Oct 25, 2017 · 14 comments
Closed
Labels
bug v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases

Comments

@brikis98
Copy link
Contributor

brikis98 commented Oct 25, 2017

Terraform Version

Terraform v0.10.7

Terraform Configuration Files

This happens with just about any configuration.

Expected Behavior

I can run terraform init without errors.

Actual Behavior

I get intermittent errors for downloading plugins that look like this:

Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...

Error installing provider "template": error fetching checksums: Get https://releases.hashicorp.com/terraform-provider-template/1.0.0/terraform-provider-template_1.0.0_SHA256SUMS: net/http: TLS handshake timeout.

Note that the particular plugin or file that fails changes randomly.

Steps to Reproduce

  1. terraform init

Important Factoids

This happens more often when working with many modules in parallel, such as on a CI server running many automated tests. Is releases.hashicorp.com failing under concurrent load? Or is it intentionally throttling requests?

Either way, this makes automated tests involving Terraform very brittle.

I enabled the plugin cache to reduce the number of necessary downloads, but I still see these errors on a regular basis.

@jbardin
Copy link
Member

jbardin commented Oct 25, 2017

Hi @brikis98,

Sorry this is causing an issue for you. I have also seen this before, but so far only with VMs on extremely oversubscribed hosts.

The default TLS handshake timeout is 10 seconds, which is quite a long time to establish the connection. A minor fix I have coming soon will better re-use connections, reducing the number of handshakes that need to be done.

I have a feeling that extending the timeout might not help much either, as I think this is partly the CDN servers reaction to the extremely slow clients. We need to reproduce this and trace the failing handshakes to be certain.

@jbardin jbardin added the bug label Oct 25, 2017
@brikis98
Copy link
Contributor Author

Well, if it helps debug the issue, this happens most often when we run tests in CircleCI, which I believe has a ~24 core machine, so there could be as many as a couple dozen of these init calls happening from various tests in parallel.

Many CDNs have throttling built in (DoS protection); any chance this is the cause here?

@brikis98
Copy link
Contributor Author

Update: For those struggling with this same issue, as a workaround, I'm doing the following:

  1. Enable the plugin provider cache.
  2. Cache the plugin provider cache using CircleCI caching

@apparentlymart
Copy link
Contributor

apparentlymart commented Oct 26, 2017

Thanks for sharing that @brikis98! I wasn't previously familiar with CircleCI caching.

From a quick read of what that feature does, it may also work to have CircleCI cache the contents of .terraform/plugins since, after an initial terraform init that should contain all of the plugins for that particular config.

It looks like the mechanism requires using the checksum of some files as a key, which may be tricky in Terraform since the entire config is consulted to decide which plugins to install. However, that could perhaps be worked around by having a separate providers.tf file that contains a provider block for each of the providers you use (including version constraints), and then using just that file as the cache key.

The output of terraform providers might make a reasonable thing to hash to get an overview of the providers used across the whole config, though it may change more often than necessary if e.g. modules are refactored while retaining the same plugin versions.

If you're running terraform init on every run as part of your automation anyway (which I would recommend) then it shouldn't hurt to let the cache persist between runs even if the dependencies do change, since terraform init is able to manage the .terraform/plugins dir automatically and clean up any plugins that are no longer used.

I imagine using Terraform's caching mechanism vs. caching the .terraform/plugins directory are functionally equivalent, since CircleCI caching is immutable, but perhaps caching the local plugin dir is more straightforward since it doesn't require any unusual configuration within Terraform itself, and Terraform is able to remove items from its local dir when they are no longer needed to prevent the cache from growing indefinitely.

@jbardin
Copy link
Member

jbardin commented Jan 25, 2018

Hi @brikis98,

Have you had a chance to try out 0.11.2 on CircleCI? That release enabled the DualStack dialer by default for http requests, so terraform can still contact the release servers on a network with a broken IPv6 configuration. Looking at the CircleCI docs it seems that they don't have complete IPv6 support yet, so it guessing it could be related.

@brikis98
Copy link
Contributor Author

@jbardin We are updating all of our repos to 0.11 now, so I'll let you know once we complete that process!

@crouchjay
Copy link

crouchjay commented Feb 6, 2018

I have been getting a similar error and I am unable to get rid of it.

2018/02/06 15:53:10 [ERR] Checkpoint error: Get https://checkpoint-api.hashicorp.com/v1/check/terraform?arch=amd64&os=darwin&signature=56982404-9b8e-0f76-67e5-26bb3a8299d2&version=0.11.3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Which ends up in a TLS handshake timeout error

@apparentlymart
Copy link
Contributor

Hi @crouchjay! The request you see failing there is the one that powers the upgrade and security bulletin checks. This particular request is not required for correct Terraform operation, so you could choose to disable it (using the settings described on the page I linked) if you don't mind Terraform not warning you about new versions being available.

If you're still seeing an error like that on 0.11.2 or newer then I'd welcome you to open a new issue describing that, since some details are different for that request (it's in a separate library, subject to different timeouts, etc) but the fix we applied for dual-stack dialing should've applied to that call as well and so that would suggest that you've encountered a new problem which we can investigate further in a new issue.

@tperelle
Copy link

tperelle commented Jul 2, 2018

Hi,
I'm with a recent version of Terraform and i try to create droplets on Digital Ocean :

% terraform version
Terraform v0.11.7
+ provider.digitalocean v0.1.3
+ provider.template v1.0.0

But i often have the same issue :

Error: Error refreshing state: 2 error(s) occurred:

* digitalocean_droplet.ucp_master: 2 error(s) occurred:

* digitalocean_droplet.ucp_master[2]: digitalocean_droplet.ucp_master.2: Error retrieving droplet: Get https://api.digitalocean.com/v2/droplets/100173689: net/http: TLS handshake timeout
* digitalocean_droplet.ucp_master[1]: digitalocean_droplet.ucp_master.1: Error retrieving droplet: Get https://api.digitalocean.com/v2/droplets/100173687: net/http: TLS handshake timeout
* digitalocean_droplet.ucp_worker: 1 error(s) occurred:

* digitalocean_droplet.ucp_worker: digitalocean_droplet.ucp_worker: Error retrieving droplet: Get https://api.digitalocean.com/v2/droplets/100173688: net/http: TLS handshake timeout

Sometimes it works... but ti's very annoying

@apparentlymart
Copy link
Contributor

Hi @tperelle! Sorry that isn't working as expected.

Those particular requests are coming from the digitalocean provider itself, so if something needs to be fixed for that it'd need to be done in the provider's own repository. Would you mind opening an issue for this over there? It's possible that the maintainers of that provider would just need to make a similar change to that from #16805 (upgrading the cleanhttp dependency), which is what we changed to make this work better for Terraform Core.

@apparentlymart
Copy link
Contributor

Hi all,

Further to my previous comment, I just wanted to sum up a few different causes we've seen for this kind of issue for future reference:

  • On a host with both IPv4 and IPv6 connectivity, Terraform versions prior to v0.11.2 will prefer IPv6. This can be problematic on systems where the IPv6 connection is slower or is actually inoperable in practice. From v0.11.2 onwards, Terraform implements RFC 6555 to mitigate this problem.
  • Some environments have either explicit or transparent HTTP proxies that are required for outbound access. Occasionally we've seen reports that poorly-performing or misconfigured proxies have led to timeout and TLS-related issues. In this case, there is no known Terraform-specific workaround and so working with the administrator of that proxy is the primary path to resolution.
  • Some users run Terraform on WiFi networks with "captive portal" intercepts which can cause confusion. There are several different approaches to intercepting outgoing traffic to redirect to a captive portal, including DNS intercepts and HTTP-level intercepts, and some of these can lead to Terraform appearing to timeout or have TLS handshake issues due to the interference of that system.

Since this particular issue was within CircleCI I'm not sure if these solutions apply there, so for the moment I'm going to leave this one open. It is possible that the IPv6 connectivity issue was affecting CircleCI, in which case Terraform should behave correctly there from v0.11.2 onwards.

@jakauppila
Copy link

jakauppila commented Jan 23, 2019

Using Terraform v0.11.11, is there any way as a user to adjust what that net/http TLS handshake value is?

I need to talk to our proxy admins about performance, but our connections through it are taking ~10.1-10.3 seconds to respond, so naturally Terraform bombs with the timeout.

image

Error downloading modules: Error loading modules: Failed to request discovery document: Get https://registry.terraform.io/.well-known/terraform.json: net/http: TLS handshake timeout

@hashibot hashibot added v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases labels Aug 22, 2019
@danieldreier
Copy link
Contributor

@brikis98 have you continued to have issues like this running a recent version of terraform on CircleCI?

I have tried to simulate the following conditions using terraform 0.12.17 using Apple's Network Link Conditioner, installing the cloudflare and github providers from a trivial main.tf.

Scenario: base case, no delays injected
Result: Success, time 0:05

Scenario: simulated 3G wireless network with 780kbps down, 330kbps up, and 100ms delay
Result: Success, time 9:47

Scenario: 250ms delay on TCP and DNS requests, 0% packet loss
Result: Success, time 1:30

Scenario: 250ms delay on TCP and DNS requests, 20% packet loss
Result: Success, time 7:36

Scenario: 500ms delay on TCP and DNS requests, 0% packet loss
Result: Success, time 3:01

Scenario: 1000ms delay on TCP and DNS requests, 0% packet loss
Result: Success, time 3:14

Based on my testing and the lack of recent updates to this issue, especially the test case with packet loss, I am inclined to think that the improvements made to 0.12 have sufficiently mitigated this such that terraform is usable in slow network conditions. I'm going to close this out for now because I'm pretty confident that the 0.12 improvements @apparentlymart described have resolved this. If you're still seeing these types of issues, feel free to re-open or file a new issue linked to this one. I'm definitely interested in hearing about people's experiences using terraform on slow or intermittent networks.

@ghost
Copy link

ghost commented Mar 28, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Mar 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases
Projects
None yet
Development

No branches or pull requests

8 participants