Skip to content

Commit

Permalink
engine: enforce and reduce shutdown time
Browse files Browse the repository at this point in the history
In the current implementation when a shutdown has been requested through
a SIGTERM (or catched by ctrl-c), Fluent Bit initialized the shutdown
process and gives a grace period to wait for the active tasks to finish.

The problem with the approach is that grace period was not being respected,
actually the grace period was being renewed always waiting to flush all
the pending tasks.

This patch changes the behavior of the grace period where now the grace
period is considered (as it should) the 'maximum time to wait' for pending
tasks. If the tasks did not finished flushing under the flush period the
service will stop right away.

In addition, when no tasks exists, the patch makes the service stop before
the grace period time, since there is no need to continue waiting.

Common tests:

 1. Exit before grace period:

    fluent-bit -i cpu -o stdout -f 1

    hit ctrl-c, the service will stop before grace period (at second 1 or 2).

 2. Force active Tasks by using an uresponsive remote network address:

    fluent-bit -i cpu -o stdout -m '*' -o http://192.168.3.4:3833 -m '*' -f 1

    on SIGTERM, the tasks will still be active because the unresponsive network,
    by forcing a shutdown the grace period will be respected and the service
    stop at the right moment.

Signed-off-by: Eduardo Silva <[email protected]>
  • Loading branch information
edsiper committed Nov 21, 2021
1 parent e4ea603 commit 8af7269
Showing 1 changed file with 28 additions and 7 deletions.
35 changes: 28 additions & 7 deletions src/flb_engine.c
Original file line number Diff line number Diff line change
Expand Up @@ -695,26 +695,42 @@ int flb_engine_start(struct flb_config *config)
if (event->type == FLB_ENGINE_EV_CORE) {
ret = flb_engine_handle_event(event->fd, event->mask, config);
if (ret == FLB_ENGINE_STOP) {
if (config->grace_count == 0) {
flb_warn("[engine] service will shutdown in max %u seconds",
config->grace);
}

/*
* We are preparing to shutdown, we give a graceful time
* of 'config->grace' seconds to process any pending event.
*/
event = &config->event_shutdown;
event->mask = MK_EVENT_EMPTY;
event->status = MK_EVENT_NONE;

/*
* Configure a timer of 1 second, on expiration the code will
* jump into the FLB_ENGINE_SHUTDOWN condition where it will
* check if the grace period has finished, or if there are
* any remaining tasks.
*
* If no tasks exists, there is no need to wait for the maximum
* grace period.
*/
config->shutdown_fd = mk_event_timeout_create(evl,
config->grace,
1,
0,
event);
flb_warn("[engine] service will stop in %u seconds", config->grace);
}
else if (ret == FLB_ENGINE_SHUTDOWN) {
flb_info("[engine] service stopped");
if (config->shutdown_fd > 0) {
mk_event_timeout_destroy(config->evl,
&config->event_shutdown);
}

/* Increase the grace counter */
config->grace_count++;

/*
* Grace period has finished, but we need to check if there is
* any pending running task. A running task is associated to an
Expand All @@ -723,13 +739,18 @@ int flb_engine_start(struct flb_config *config)
* wait again for the grace period and re-check again.
*/
ret = flb_task_running_count(config);
if (ret > 0) {
flb_warn("[engine] shutdown delayed, grace period has "
"finished but some tasks are still running.");
flb_task_running_print(config);
if (ret > 0 && config->grace_count < config->grace) {
if (config->grace_count == 1) {
flb_task_running_print(config);
}
flb_engine_exit(config);
}
else {
if (ret > 0) {
flb_task_running_print(config);
}
flb_info("[engine] service has stopped (%i pending tasks)",
ret);
ret = config->exit_status_code;
flb_engine_shutdown(config);
config = NULL;
Expand Down

0 comments on commit 8af7269

Please sign in to comment.