-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io: add connection backoff #3191
base: master
Are you sure you want to change the base?
Conversation
b8ab27b
to
995a16d
Compare
29b0523
to
f449406
Compare
src/flb_io.c
Outdated
@@ -63,9 +63,56 @@ | |||
#include <fluent-bit/flb_coro.h> | |||
#include <fluent-bit/flb_http_client.h> | |||
|
|||
/* Increase backoff time of an upstream */ | |||
void flb_io_backoff_upstream(struct flb_upstream *u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe declare static
instead of adding to header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same with flb_io_in_backoff
)
@@ -342,6 +393,10 @@ static FLB_INLINE ssize_t net_io_read_async(struct flb_coro *co, | |||
int flb_io_net_write(struct flb_upstream_conn *u_conn, const void *data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also handle backoff in flb_io_net_read
?
f449406
to
4358cea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this contribution!, minor changes are requested.
src/flb_upstream.c
Outdated
@@ -65,6 +65,18 @@ struct flb_config_map upstream_net[] = { | |||
"before it is retired." | |||
}, | |||
|
|||
{ | |||
FLB_CONFIG_MAP_TIME, "net.initial_backoff", "0s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can you rename the property to something like
net.backoff_init
?
src/flb_upstream.c
Outdated
}, | ||
|
||
{ | ||
FLB_CONFIG_MAP_TIME, "net.max_backoff", "0s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as before: net.backoff_max
include/fluent-bit/flb_upstream.h
Outdated
@@ -83,6 +83,10 @@ struct flb_upstream { | |||
#endif | |||
|
|||
struct mk_list _head; | |||
|
|||
/* Backoff state. */ | |||
time_t next_attempt_time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please prefix the variable with backoff_...
include/fluent-bit/flb_upstream.h
Outdated
|
||
/* Backoff state. */ | ||
time_t next_attempt_time; | ||
int last_backoff_seconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do a prefix: backoff...
4358cea
to
4a5e951
Compare
Signed-off-by: Alexander Kabakaev <[email protected]>
4a5e951
to
b58cfc0
Compare
@edsiper, thanks for quick review! The suggested changes are implemented. PTAL |
@kabakaev can you pls fix the conflicts so we can do final review/merge ? |
Hi @kabakaev, can you please review the requested changes? |
In normal operation,
fluent-bit
reuses TCP connections, hence new messages are flushed without sending a TCP SYN.But if an output connection cannot be established, then each
flb_io_net_write()
call will trigger connection setup and will send a series of TCP SYN packets (one per thread?).The actual issue is described in #3103.
We observed this issue when hundreds of
fluent-bit
agents tried to send logs viaforward
to a set of receivingfluent-bit
s, which were all down due to config error. The receiving FLB was hosted behind an openstack load balancer, a Linux stateful firewall and atraefik
ingress controller.Apart from high load, the flood of SYN packets may exhaust the connection tracking table, impacting the whole network infrastructure.
Fixes #3103.
This PR is inspired by GRPC backoff implementation.
Backoff is disabled by default.
If enabled, backoff will limit the number of TCP SYN packets during an output destination outage (raw data):
This chart shows rate of TCP SYN packets. The data is collected by
tcpdump
as described inHow to test
section below.Testing
Example of backoff configuration is given below.
Valgrind output is uploaded to my gist.
Documentation
Documentation for this feature is submitted as docs PR491.
How to test
Compile this version:
Collect SYN packets without backoff
Simulate connection timeout and run
tcpdump
on a separate console:and start
fluent-bit
without backoff settings:Collect SYN packets with initial backoff of 1 second
Simulate connection timeout and run
tcpdump
on a separate console:and start
fluent-bit
with backoff settings:Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Alexander Kabakaev [email protected], Daimler TSS GmbH, imprint