Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default transport configurations #164

Merged
merged 1 commit into from
Jan 11, 2021
Merged

Update default transport configurations #164

merged 1 commit into from
Jan 11, 2021

Conversation

findkim
Copy link
Contributor

@findkim findkim commented Jan 8, 2021

http.MaxIdleConnsPerHost was default to a very low value that caused TCP connections to not be reused across hcat dependency monitoring. Only ~1 connections were being reused from the initial set of hcat requests to Consul, and the remaining dependencies required establishing a new connection.

Changes:

  • DefaultMaxIdleConnsPerHost: CTS default changed from 1 to 100 (the transport default is 2 and setting -1 disables keep alives altogether). This increases the # of reusable TCP connections for service monitoring. From 1:2 connections per service now to 1:1.
  • DefaultMaxIdleConns default changed from 100 to 0, which disables the the limit. The previous limit not only caps all the requests to Consul but also requests to other hosts, which is not needed at this time.
  • DefaultIdleConnTimeout: from 90s to 5s. hcat monitoring of dependencies should theoretically have 1 connection per 1 dependency for blocking queries. Any idle connections not immediately used would likely never be reused, so close it out earlier.

For example, 100 services required about 200 connections at startup (normal request and then blocking query).

This is the primary bug in #146 when monitoring 100 services always errored with EOF/429 when the Consul agent has a default http_max_conns_per_client=200. This should now allow up to 198 services by default without resulting in connection limits.

See comment below for updates on ^

Operators can increase consul.transport.max_idle_conns_per_host value to reduce the delay between detecting updates and task execution when monitoring 200 services or more.

MaxIdleConnsPerHost was default to a very low value that caused
TCP connections to not be reused across hcat dependency monitoring.
Only ~1 connections were being reused from the initial set of hcat
requests to Consul, and the remaining dependencies required
establishing a new connection. 100 services required about 200
connections at startup (normal request and then blocking query).
@findkim findkim added the bug Something isn't working label Jan 8, 2021
@findkim
Copy link
Contributor Author

findkim commented Jan 8, 2021

TODO: write an e2e test with a running Consul agent for http_max_conns_per_client (Consul) + max_idle_conns_per_host (CTS)

@findkim
Copy link
Contributor Author

findkim commented Jan 11, 2021

Update: I had a bug within my on repro steps, so this does not address why 100 services caused 200 connections. It was my overlook of curl also contributing to the limit. Once I had waited for the curl connections to close out, I ran CTS with 100 services and it successfully executed prior to these changes.

These changes do however improve the TCP connection reusability of hcat/consul client. This is not something I was able to write an e2e test for since the transport and dialing code is housed in hcat.

Inserting code around the dialing provided insight on how many times a connection was established (here)

+var dialCount uint64

 func newTransport(i *CreateClientInput) (*http.Transport, error) {
        // This transport will attempt to keep connections open to the server.
+       dialer := net.Dialer{
+               Timeout:   i.TransportDialTimeout,
+               KeepAlive: i.TransportDialKeepAlive,
+       }
        transport := &http.Transport{
                Proxy: http.ProxyFromEnvironment,
-               Dial: (&net.Dialer{
-                       Timeout:   i.TransportDialTimeout,
-                       KeepAlive: i.TransportDialKeepAlive,
-               }).Dial,
+               Dial:  dialer.Dial,
+               DialContext: func(ctx context.Context, network, address string) (net.Conn, error) {
+                       atomic.AddUint64(&dialCount, 1)
+                       log.Printf("[INFO] Dialing %s %s: %d", network, address, atomic.LoadUint64(&dialCount))
+                       return dialer.DialContext(ctx, network, address)
+               },
                DisableKeepAlives:   i.TransportDisableKeepAlives,
                MaxIdleConns:        i.TransportMaxIdleConns,
                IdleConnTimeout:     i.TransportIdleConnTimeout,
                MaxIdleConnsPerHost: i.TransportMaxIdleConnsPerHost,
                TLSHandshakeTimeout: i.TransportTLSHandshakeTimeout,
        }

Prior to changes: 100 services resulted in 190 dials.
After changes: 100 services resulted in 100 dials.

@findkim findkim requested a review from a team January 11, 2021 20:13
@findkim findkim added enhancement New feature or request and removed bug Something isn't working labels Jan 11, 2021
@findkim findkim added this to the v0.1.0 Beta milestone Jan 11, 2021
Copy link
Contributor

@eikenb eikenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

One related question.. would you suggest changes to hcat's default settings in relation to this? Seems like the defaults would be improved if changed to reflect your findings.

@findkim
Copy link
Contributor Author

findkim commented Jan 11, 2021

Thanks for the review @eikenb!

I don't recall hcat having default options and relies on the callers' defaults. But yeah adding default options I think would be valuable. I would imagine this is affecting Consul Template as well (that's where I'm guilty of copy pasta'ing the config defaults from)

@findkim findkim merged commit 795b542 into master Jan 11, 2021
@findkim findkim deleted the conn-limit-bug branch January 11, 2021 22:48
eikenb added a commit to hashicorp/consul-template that referenced this pull request Jan 6, 2023
See this ticket for some details...
hashicorp/consul-terraform-sync#164

Summary: http.MaxIdleConnsPerHost was default to a very low value that
caused TCP connections to not be reused across dependency monitoring.
Only ~1 connections were being reused from the initial set of hcat
requests to Consul, and the remaining dependencies required establishing
a new connection.
eikenb added a commit to hashicorp/consul-template that referenced this pull request Jan 6, 2023
See this ticket for some details...
hashicorp/consul-terraform-sync#164

Summary: http.MaxIdleConnsPerHost was default to a very low value that
caused TCP connections to not be reused across dependency monitoring.
Only ~1 connections were being reused from the initial set of hcat
requests to Consul, and the remaining dependencies required establishing
a new connection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants