[v22.3.x] net: Explicitly set and reduce TCP keepalive on the kafka API #11775

vbotbuildovich · 2023-06-29T09:49:45Z

Backport of PR #11496

We have seen RP "leak" client connections in different scenarios ([1] [2]). One of those cases is when running in cloudv2 on AWS. The inuse AWS load balancer which distributes bootstrap server connections to all brokers "drops" connections after 350s. This means that when the client eventually disconnects the LB doesn't forward the RST/FIN to the RP brokers anymore. As a result RP thinks those connections are still going. Scenarios where nodes/VMs just crash result in similar scenarios. Redpanda right now doesn't have something like an application level "connection reaper" that closes inactive connections. However, the issue above eventually gets resolved by TCP keepalive. We do already enable TCP keepalive but don't specify any of the parameters explicitly which means we use the linux defaults (or whatever is configured). The default (and as used in cloudv2) has a TCP idle timeout of 7200 seconds. Hence it takes a bit more than two hours for those connections to get cleaned up. This PR makes all the three TCP keepalive parameters configurable and explicitly sets them on Kafka connections. As part of that we also lower the values so that it triggers a lot earlier. The new defaults (in RP) are: - Idle timeout: 120s (vs 7200s linux default) - Interval: 60s (vs 75s linux default) - Probes: 3 (vs 9 linux default) As a result on idle connections we send a TCP keep alive (this is just a TCP packet without data) every 2 minutes. For a very large idle set of connections of something like 30k this would result in about 250 packets every second which shouldn't be of issue. On dead connections we send the first TCP keepalive after 2 minutes. Then 2 more packets in one minute intervals and eventually close the connection after a total of 5 minutes idle time. Testing keepalive is slightly tricky as we need to convince the client to stop responding to the keepalive packets. Given this is done implicilty by the kernel there is no easy switch to stop that. We use an iptables rule that drops all outgoing packets from the client which means no tcp keepalive response packets will reach RP and subsequently RP will RST the connection. To make sure that we don't drop any other packets and also in case we leak the rule for any reason we create a random group and use iptables owner module to apply the rule to that group only. [1] Issue redpanda-data/cloudv2#6713 [2] Issue redpanda-data/core-internal#411 (cherry picked from commit a90cb32)

StephanDollberg · 2023-07-04T12:50:05Z

/ci-repeat 1

vbotbuildovich added this to the v22.3.x-next milestone Jun 29, 2023

vbotbuildovich added the kind/backport PRs targeting a stable branch label Jun 29, 2023

vbotbuildovich requested a review from dotnwat June 29, 2023 09:49

github-actions bot added the area/redpanda label Jun 29, 2023

StephanDollberg marked this pull request as ready for review July 4, 2023 18:01

StephanDollberg requested a review from bharathv July 4, 2023 18:01

StephanDollberg approved these changes Jul 7, 2023

View reviewed changes

StephanDollberg merged commit cceefd9 into redpanda-data:v22.3.x Jul 7, 2023

BenPope modified the milestones: v22.3.x-next, v22.3.23 Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v22.3.x] net: Explicitly set and reduce TCP keepalive on the kafka API #11775

[v22.3.x] net: Explicitly set and reduce TCP keepalive on the kafka API #11775

vbotbuildovich commented Jun 29, 2023

StephanDollberg commented Jul 4, 2023

[v22.3.x] net: Explicitly set and reduce TCP keepalive on the kafka API #11775

[v22.3.x] net: Explicitly set and reduce TCP keepalive on the kafka API #11775

Conversation

vbotbuildovich commented Jun 29, 2023

StephanDollberg commented Jul 4, 2023