You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a Consul client cancels a blocking HTTP-query, the TCP connection is not closed correctly by the server.
The TCP connection stays in the FIN_WAIT-2 state until it's tcp_fin_timeout expires.
The FIN_WAIT-2 means that the client is waiting for an ACK from the server.
When a lot of blocking queries are cancelled and retried at the same time, the servers http_max_conns_per_client can be hit the further client queries will fail.
I expect that the server closes the http connection completely when a blocking query is aborted and no TCP connections are left when the client program terminates.
We run into this issue in our grpc-consul-resolver (https://github.com/simplesurance/grpcconsulresolver).
A testcase opened and closed a lot of grpc-client connection in a short timeframe.
When the GRPC connection is closed, the blocking query of the grpcconsulresolver is cancelled by cancelling it's context.
The TCP connections piled up until the consul server did not accept further queries.
This issue could also be triggered in a production environment, for example when a lot of applications are redeployed in parallel and they all cancel consul blocking queries.
It would also work as DoS attack.
Reproduction Steps
With a Go-Client using the consul package as client
Run a consul agent (consul agent -dev -enable-script-checks -node=web -ui)
Run the following Go program, it creates a new service entry, queries the service entry once to get waitIndex and then in a loop does a blocking query with the waitIndex that is aborted via the context.
Overview of the Issue
When a Consul client cancels a blocking HTTP-query, the TCP connection is not closed correctly by the server.
The TCP connection stays in the FIN_WAIT-2 state until it's tcp_fin_timeout expires.
The FIN_WAIT-2 means that the client is waiting for an ACK from the server.
When a lot of blocking queries are cancelled and retried at the same time, the servers
http_max_conns_per_client
can be hit the further client queries will fail.I expect that the server closes the http connection completely when a blocking query is aborted and no TCP connections are left when the client program terminates.
We run into this issue in our grpc-consul-resolver (https://github.com/simplesurance/grpcconsulresolver).
A testcase opened and closed a lot of grpc-client connection in a short timeframe.
When the GRPC connection is closed, the blocking query of the grpcconsulresolver is cancelled by cancelling it's context.
The TCP connections piled up until the consul server did not accept further queries.
This issue could also be triggered in a production environment, for example when a lot of applications are redeployed in parallel and they all cancel consul blocking queries.
It would also work as DoS attack.
Reproduction Steps
With a Go-Client using the consul package as client
consul agent -dev -enable-script-checks -node=web -ui
)After some time the consul query failed with an EOF error and the program terminates. The consul server reached it's
http_max_conns_per_client
limit.netstat or
ss -atupn '( dport = :8500 )'
will show a lot of TCP connections in FIN-WAIT-2 state.With Curl
consul agent -dev -enable-script-checks -node=web -ui
)curl --request PUT --data '{ "Address": "localhost", "Node": "web", "Service": { "Service": "testentry" } }' http://localhost:8500/v1/catalog/register
ss -atupn '( dport = :8500 )'
(or run it viawatch
to monitor it)ss -atupn '( dport = :8500 )'
, for every aborted curl query a FIN-WAIT-2 TCP connection appears.Operating system and Environment details
/proc/sys/net/ipv4/tcp_fin_timeout
is 60The text was updated successfully, but these errors were encountered: