Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: fix grace period #4355

Merged
merged 2 commits into from
Nov 20, 2021
Merged

engine: fix grace period #4355

merged 2 commits into from
Nov 20, 2021

Conversation

edsiper
Copy link
Member

@edsiper edsiper commented Nov 20, 2021

In the current implementation when a shutdown has been requested through
a SIGTERM (or catched by ctrl-c), Fluent Bit initialized the shutdown
process and gives a grace period to wait for the active tasks to finish.

The problem with the approach is that grace period was not being respected,
actually the grace period was being renewed always waiting to flush all
the pending tasks.

This patch changes the behavior of the grace period where now the grace
period is considered (as it should) the 'maximum time to wait' for pending
tasks. If the tasks did not finished flushing under the flush period the
service will stop right away.

In addition, when no tasks exists, the patch makes the service stop before
the grace period time, since there is no need to continue waiting.

Common tests:

1. Exit before grace period

fluent-bit -i cpu -o stdout -f 1

hit ctrl-c, the service will stop before grace period (at second 1 or 2).

2. Force active Tasks by using an uresponsive remote network address

fluent-bit -i cpu -o stdout -m '*' -o http://192.168.3.4:3833 -m '*' -f 1

on SIGTERM, the tasks will still be active because the unresponsive network,
by forcing a shutdown the grace period will be respected and the service
stop at the right moment.

In the current implementation when a shutdown has been requested through
a SIGTERM (or catched by ctrl-c), Fluent Bit initialized the shutdown
process and gives a grace period to wait for the active tasks to finish.

The problem with the approach is that grace period was not being respected,
actually the grace period was being renewed always waiting to flush all
the pending tasks.

This patch changes the behavior of the grace period where now the grace
period is considered (as it should) the 'maximum time to wait' for pending
tasks. If the tasks did not finished flushing under the flush period the
service will stop right away.

In addition, when no tasks exists, the patch makes the service stop before
the grace period time, since there is no need to continue waiting.

Common tests:

 1. Exit before grace period:

    fluent-bit -i cpu -o stdout -f 1

    hit ctrl-c, the service will stop before grace period (at second 1 or 2).

 2. Force active Tasks by using an uresponsive remote network address:

    fluent-bit -i cpu -o stdout -m '*' -o http://192.168.3.4:3833 -m '*' -f 1

    on SIGTERM, the tasks will still be active because the unresponsive network,
    by forcing a shutdown the grace period will be respected and the service
    stop at the right moment.

Signed-off-by: Eduardo Silva <[email protected]>
@l2dy
Copy link
Contributor

l2dy commented Dec 11, 2021

Finally! Thank you.

@lecaros lecaros added this to the Fluent Bit v1.8.12 milestone Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants