-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: make Transport's idle connection management aware of DNS changes? #23427
Comments
/cc @tombergan |
Possible Signals If
Action
I guess the use of "?" in the title was due to implementation complexity/propriety of addressing this in the standard library. Please let me know if I've gotten something wrong or if there are other issues I'm missing. |
@meirf For us To answer your question about "?" in the title: I don't remember the why. I deleted it from the title. I am only digging from time to time into the Go src code, mostly in the |
tl;dr (1) Idle connections' awareness of dns changes appears to be absent from the API. (2) The easy part is forcing a redial via closure. (3) IMO the hard part is figuring out what signal the code should use to redial. @szuecs Based on my reading of your most recent comment, you think idle connections' awareness of dns changes is already promised by the current API. I (and bradfitz based on his title) do not agree that idle connections' awareness of dns changes is already promised. I think we agree that currently the only way to be aware of a dns change is to force a redial by not having connections kept alive, but IdleConnTimeout won't force a keep-alive connection to be closed unless your situation meet its definition. If you have a keep-alive connection and it never becomes idle, by definition it will never close itself due to IdleConnTimeout. The cumulative amount of time a keep-alive connection is idle doesn't matter. The only way the keep-alive connection will be closed due to IdleConnTimeout is if it's idle for IdleConnTimeout consecutive/contiguous time.
It's clear that all these would be helpful to you in redialing and therefore connecting to the new ip. This indirectly results in dns awareness (and possibly a lot of inefficiency). But in terms of the standard library, there is no promise of direct dns awareness.
Briefly touching upon implementation, I don't think dns awareness in the standard library would use CloseIdleConnections since that blindly closes all keep-alive connections and I'd guess that we'd want dns change detection to be done per connection. |
@meirf Ok I was not sure if questions were asked to me. As far as I understand now you did not asked me. :) |
Existing implementation appeared to have no way to set a connection max lifetime. A max lifetime allows for refreshing connection lifecycle concerns such as dns resolutions for those that need it. When initialized to a non-zero value the connection will be closed after the duration has passed. This change is backwards compatible. Fixes golang#23427
https://golang.org/cl/341329 has a possible approach that @bradfitz and I came up with. The idea is to add a
|
Change https://golang.org/cl/341329 mentions this issue: |
I like the simplicity of the extra hook. I think I prefer the deadline approach which is used in the idle timeout purely because I know the client is not going to keep the connection around. So don't need a usage attempt to prompt the connection to close. Would you need to add similar functionality in http2's pool? |
are you targeting 1.18 @neild ? , that hook looks like a really nice addition |
@neild for me personally Thanks for working on it! |
Existing implementation appeared to have no way to set a connection max lifetime. A max lifetime allows for refreshing connection lifecycle concerns such as dns resolutions for those that need it. When initialized to a non-zero value the connection will be closed after the duration has passed. This change is backwards compatible. Fixes golang#23427
Existing implementation appeared to have no way to set a connection max lifetime. A max lifetime allows for refreshing connection lifecycle concerns such as dns resolutions for those that need it. When initialized to a non-zero value the connection will be closed after the duration has passed. This change is backwards compatible. Fixes golang#23427
@neild One of the reasons I am all for adding a MaxConnLifespan to the Transport and socket wait logic is that a given http service which brokers connections to other services can start truncating connections without requiring a new connection/request to be made to a service. This in theory would free up connections for any backing services as horizontal scale-out ( runtime replication ) events occur and potentially reduce the number of idle connections at any given time across the gamut of services. We've implemented this logic before in other protocols and saw clear performance gains as well as the dns re-resolution benefits. Please consider the PR I have out there as well. It's my first one so I don't mind it ultimately being closed because it has style issues. Would be ecstatic to see the capability in the standard lib even if my name is not attached. 😄 Gonna resolve the conflict in the api/next.txt file again in a sec. Edit: I am also aware that there is already an idle timeout setting, but I assume the connection reuse logic is FIFO. In such a queuing strategy we'd need to have a solid "idle timeout" period of no new requests for the idle connections to trend towards zero. Depending on traffic patterns this can be quite a rare event. |
@neild assuming http2 is the most impacted by this issue since it will multiplex the requests over a single connection, not dialing again, and that the ClientConnPool already is an interface and it can be set in the http2 transport
what if you can expose the clientConnPool type so people can implement their own logic on type myClientConnPool {
net.ConnectionPool
}
func (p *myClientConnPool) GetClientConn(req *http.Request, addr string) (*ClientConn, error) {
// do things here
return p.GetClientConn(req,addr)
}
func (p *myClientConnPool) MarkDead(c *ClientConn) {
p.MarkDead(c)
} I think that the the connection pool type can benefit other issues like #37373 , this last issue may require some additional changes since the conn poll is keyed by EDIT I see my mistake, the connection pool used in the std lib is
|
@rohitkeshwani07 In case of high TTL, redial probably won't work because of DNS cache in host/resolver. If the main concern is DNS change, then it's better to use TTL because:
But if the main problem resides on client side(eg: firewall drops longstanding connections), then it make sense to take client side approach like configure max connection lifetime or hooks proposed by @neild on client side. |
Existing implementation appeared to have no way to set a connection max lifetime. A max lifetime allows for refreshing connection lifecycle concerns such as dns resolutions for those that need it. When initialized to a non-zero value the connection will be closed after the duration has passed. This change is backwards compatible. Fixes golang#23427
Hello again! I still really like adding logic to the transport layers so system resources can close in advance of trying to pull a connection from the pool should a deadline be reached: #46714 This behavior is more ideal for me just because then I don't have system nodes keeping connections alive just so the pool layer on top can intercept an established connection candidate, see it is too old, and then kill it. May as well let it take care of itself in that regard and short circuit earlier. Any way I can help ensure an approach of some kind makes it into the standard lib? 😄 I think it would be very powerful to get the standard http kit up to par with or allow for the creation of the same knobs we have in https://pkg.go.dev/database/sql |
Existing implementation appeared to have no way to set a connection max lifetime. A max lifetime allows for refreshing connection lifecycle concerns such as dns resolutions for those that need it. When initialized to a non-zero value the connection will be closed after the duration has passed. This change is backwards compatible. Fixes golang#23427
I wonder how much of this problem is purely one of loadbalancing concepts people may be trying to implement. If a host record resolves to 3 ip nodes and suddenly we add or remove one, then respectively we need to ensure the new node starts taking requests as soon as we know about it so load spreads ( no need to close connections until we approach the max connection limit or idle timeouts happen ) or remove connections tied to the deregistering ( potentially already deregistered ) ip address. The latter case of removing a node is trivial and fairly self-correcting before any DNS TTL and refresh cycle would inform us on average. In the same token, idle connection timeout can create "randomly hot" nodes over time by just the probability of FIFO rotating through connections cyclicly. A max lifespan is a fairly nuclear solution to this problem when perhaps envoy proxy should be the full-featured solution for balancing out-of-app. For people wanting a solution in the same binary they really probably want LIFO connection queuing under a round-robin ( or some distance/efficiency-aware weighted function ) connection broker over the LIFO queues partitioned by IP. This would allow for any "logically extra" connections the service can live without to end up dying off just by virtue of the connection timeouts. If DNS appears to remove a host we can assume the DNS source has determined that the host should get phased out and schedule the relevant connections to die as soon as their transactions finish or immediately inform the broker to kill that pool of trusted connections. As a DNS entry is discovered for a hostname a LIFO queue would be created and added to the connection broker. Load would spread almost immediately across the upstream offering, and old connections no longer getting any use on the other nodes would start to time out and die off, expensive new connection flows continue to be avoided. Connection max lifetime/reuse is likely not the feature people need or want. ( If you do need/want it cool, my PR offers it! ) It just looks like what we're reaching for in this conversation when we really want to observe+schedule DNS resolution and use that to inform a connection pooling strategy where we can change the implementation of popping and pushing to a queue we maintain externally that is LIFO or FIFO or anything else - but note we need to know the connection's target IP to achieve these goals. It's worth discussing in more detail by persons probably smarter than myself. Typically when users of one of my libs want agency I will happily give them the simplest interface they need to achieve their goals and default the behavior to off. Perhaps the maintainers are willing to do similar here? |
Thanks @josephcopenhaver for your answer. A lot of things make sense to me. The problem is not proxy specific, more http upstream specific. |
Load balancing is not the reason why I'm interested in this issue. My use case: a long-running workload that has a steady cadence of requests to a HTTP 1.1 service
That is not a safe assumption. For example, DNS responses sent by an AWS ALB will cycle through a subset of the existing IP addresses. Just because an IP does not appear in a subsequent DNS query does not mean that the corresponding instance no longer accepts traffic. In fact, that IP will most likely reappear in a following DNS response.
Yes and no. Years of "best practices" have conditioned everyone to expect that setting a low DNS TTL will allow service operators to induce HTTP clients to shift traffic with minimum delay. Many browsers and software SDKs behave that way (even though there's nothing in the HTTP specification that mandates keep-alives honor DNS TTLs), but |
okhttp suggested to do this on the server side and send |
@szuecs while it is possible to solve this server-side, sometimes the server-side application is owned by a third party so it's not within our control to make changes to it. Our solution for working around the lack of a client-side connection-lifetime timeout is:
|
@awnumar this is also possible and I think it would be beneficial to have it on the client but I wanted also to show that there are others that show server side solutions in this context. A chosen solution depends on the context. |
@szuecs just contributing here with my 2 cents: I recently attempted the suggested solution by the OkHttp authors, and it does work, but not with OkHttp because it doesn't actually seem to run name resolution again until it actually gets an error from the server (e.g. a 503). The server-side approach seems to still require some proper handling on the client to achieve awareness of DNS record changes. |
@andrebrait thanks! We also added zalando/skipper#3246 to fix this similar to other proxies. okhttp seems to be unfortunate in handling this case. :( |
What version of Go are you using (
go version
)?go version go1.9 linux/amd64
and
go version go1.9.2 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?% go env
GOARCH="amd64"
GOBIN="/home/sszuecs/go/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/sszuecs/go"
GORACE=""
GOROOT="/usr/share/go"
GOTOOLDIR="/usr/share/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build505089582=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
What did you do?
I am running
go run main.go
and change the /etc/hosts to change www.google.de to 127.0.0.1.Depending on the order it fails to switch DNS after IdleConnTimeout or it will never request DNS again. DNS will be tried to lookup in case you first target it to 127.0.0.1 and after that comment the entry in /etc/hosts. The problem is that if you want to change your target loadbalancers via DNS lookup, this will not be done. The workaround is commented in the code, which reliably will do the DNS failover. The problem seems to be that IdleConnTimeout is bigger then the time.Sleep duration in the code, which you can also change to see that this works. In case of being an edge proxy with high number of requests, the case IdleConnTimeout < process-next-request will never happen.
What did you expect to see?
I want to to see that IdleConnTimeout will reliably close idle connections, such that DNS is queried for new connections again, similar to the goroutine case in the code. We need to slowly being able to fade traffic.
What did you see instead?
In case you start the application with /etc/hosts entry is not set, and then change it, it will never fail the request, so the new DNS lookup is not being made.
The text was updated successfully, but these errors were encountered: