-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update default transport configurations #164
Conversation
MaxIdleConnsPerHost was default to a very low value that caused TCP connections to not be reused across hcat dependency monitoring. Only ~1 connections were being reused from the initial set of hcat requests to Consul, and the remaining dependencies required establishing a new connection. 100 services required about 200 connections at startup (normal request and then blocking query).
TODO: write an e2e test with a running Consul agent for http_max_conns_per_client (Consul) + max_idle_conns_per_host (CTS) |
Update: I had a bug within my on repro steps, so this does not address why 100 services caused 200 connections. It was my overlook of curl also contributing to the limit. Once I had waited for the curl connections to close out, I ran CTS with 100 services and it successfully executed prior to these changes. These changes do however improve the TCP connection reusability of hcat/consul client. This is not something I was able to write an e2e test for since the transport and dialing code is housed in hcat. Inserting code around the dialing provided insight on how many times a connection was established (here) +var dialCount uint64
func newTransport(i *CreateClientInput) (*http.Transport, error) {
// This transport will attempt to keep connections open to the server.
+ dialer := net.Dialer{
+ Timeout: i.TransportDialTimeout,
+ KeepAlive: i.TransportDialKeepAlive,
+ }
transport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
- Dial: (&net.Dialer{
- Timeout: i.TransportDialTimeout,
- KeepAlive: i.TransportDialKeepAlive,
- }).Dial,
+ Dial: dialer.Dial,
+ DialContext: func(ctx context.Context, network, address string) (net.Conn, error) {
+ atomic.AddUint64(&dialCount, 1)
+ log.Printf("[INFO] Dialing %s %s: %d", network, address, atomic.LoadUint64(&dialCount))
+ return dialer.DialContext(ctx, network, address)
+ },
DisableKeepAlives: i.TransportDisableKeepAlives,
MaxIdleConns: i.TransportMaxIdleConns,
IdleConnTimeout: i.TransportIdleConnTimeout,
MaxIdleConnsPerHost: i.TransportMaxIdleConnsPerHost,
TLSHandshakeTimeout: i.TransportTLSHandshakeTimeout,
} Prior to changes: 100 services resulted in 190 dials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
One related question.. would you suggest changes to hcat's default settings in relation to this? Seems like the defaults would be improved if changed to reflect your findings.
Thanks for the review @eikenb! I don't recall hcat having default options and relies on the callers' defaults. But yeah adding default options I think would be valuable. I would imagine this is affecting Consul Template as well (that's where I'm guilty of copy pasta'ing the config defaults from) |
See this ticket for some details... hashicorp/consul-terraform-sync#164 Summary: http.MaxIdleConnsPerHost was default to a very low value that caused TCP connections to not be reused across dependency monitoring. Only ~1 connections were being reused from the initial set of hcat requests to Consul, and the remaining dependencies required establishing a new connection.
See this ticket for some details... hashicorp/consul-terraform-sync#164 Summary: http.MaxIdleConnsPerHost was default to a very low value that caused TCP connections to not be reused across dependency monitoring. Only ~1 connections were being reused from the initial set of hcat requests to Consul, and the remaining dependencies required establishing a new connection.
http.MaxIdleConnsPerHost
was default to a very low value that caused TCP connections to not be reused across hcat dependency monitoring. Only ~1 connections were being reused from the initial set of hcat requests to Consul, and the remaining dependencies required establishing a new connection.Changes:
DefaultMaxIdleConnsPerHost
: CTS default changed from 1 to 100 (the transport default is 2 and setting -1 disables keep alives altogether). This increases the # of reusable TCP connections for service monitoring. From 1:2 connections per service now to 1:1.DefaultMaxIdleConns
default changed from 100 to 0, which disables the the limit. The previous limit not only caps all the requests to Consul but also requests to other hosts, which is not needed at this time.DefaultIdleConnTimeout
: from 90s to 5s. hcat monitoring of dependencies should theoretically have 1 connection per 1 dependency for blocking queries. Any idle connections not immediately used would likely never be reused, so close it out earlier.For example, 100 services required about 200 connections at startup (normal request and then blocking query).This is the primary bug in #146 when monitoring 100 services always errored with EOF/429 when the Consul agent has a defaulthttp_max_conns_per_client=200
. This should now allow up to 198 services by default without resulting in connection limits.See comment below for updates on ^
Operators can increase
consul.transport.max_idle_conns_per_host
value to reduce the delay between detecting updates and task execution when monitoring 200 services or more.