From ee2f139a33e04eb02e581b6940a2e2e63532d684 Mon Sep 17 00:00:00 2001 From: Lynette Miles Date: Fri, 8 Nov 2024 08:56:21 -0800 Subject: [PATCH 1/3] admin: schedule: Updating for style and consistency Signed-off-by: Lynette Miles --- administration/scheduling-and-retries.md | 86 ++++++++++++++---------- 1 file changed, 50 insertions(+), 36 deletions(-) diff --git a/administration/scheduling-and-retries.md b/administration/scheduling-and-retries.md index eb865096d..4cabf1e8b 100644 --- a/administration/scheduling-and-retries.md +++ b/administration/scheduling-and-retries.md @@ -2,30 +2,39 @@ -[Fluent Bit](https://fluentbit.io) has an Engine that helps to coordinate the data ingestion from input plugins and calls the _Scheduler_ to decide when it is time to flush the data through one or multiple output plugins. The Scheduler flushes new data at a fixed time of seconds and the _Scheduler_ retries when asked. +[Fluent Bit](https://fluentbit.io) has an engine that helps to coordinate the data +ingestion from input plugins. The engine calls the _scheduler_ to decide when it's time to +flush the data through one or multiple output plugins. The scheduler flushes new data +at a fixed time of seconds and retries when asked. -Once an output plugin gets called to flush some data, after processing that data it can notify the Engine three possible return statuses: +When an output plugin gets called to flush some data, after processing that data it +can notify the engine using these possible return statuses: -* OK -* Retry -* Error +- `OK`: Data successfully processed and flushed. +- `Retry`: If a retry is requested, the engine asks the scheduler to retry flushing + that data. The scheduler decides how many seconds to wait before retry. +- `Error`: An unrecoverable error occurred and the engine shouldn't try to flush that data again. -If the return status was **OK**, it means it was successfully able to process and flush the data. If it returned an **Error** status, it means that an unrecoverable error happened and the engine should not try to flush that data again. If a **Retry** was requested, the _Engine_ will ask the _Scheduler_ to retry to flush that data, the Scheduler will decide how many seconds to wait before that happens. +## Configure wait time for retry -## Configuring Wait Time for Retry +The scheduler provides two configuration options called `scheduler.cap` and +`scheduler.base` which can be set in the Service section. These determine the waiting +time before a retry happens. -The Scheduler provides two configuration options called **scheduler.cap** and **scheduler.base** which can be set in the Service section. +| Key | Description | Default | +| --- | ------------| --------------| +| `scheduler.cap` | Set a maximum retry time in seconds. Supported in v1.8.7 or greater. | `2000` | +| `scheduler.base` | Set a base of exponential backoff. Supported in v1.8.7 or greater. | `5` | -| Key | Description | Default Value | -| -- | ------------| --------------| -| scheduler.cap | Set a maximum retry time in seconds. The property is supported from v1.8.7. | 2000 | -| scheduler.base | Set a base of exponential backoff. The property is supported from v1.8.7. | 5 | +The `scheduler.base` determines the lower bound of time and the `scheduler.cap` +determines the upper bound for each retry. -These two configuration options determine the waiting time before a retry will happen. +Fluent Bit uses an exponential backoff and jitter algorithm to determine the waiting +time before a retry. The waiting time is a random number between a configurable upper +and lower bound. For a detailed explanation of the exponential backoff and jitter algorithm, see +[Exponential Backoff And Jitter](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/). -Fluent Bit uses an exponential backoff and jitter algorithm to determine the waiting time before a retry. - -The waiting time is a random number between a configurable upper and lower bound. +For example: For the Nth retry, the lower bound of the random number will be: @@ -35,23 +44,26 @@ The upper bound will be: `min(base * (Nth power of 2), cap)` -Given an example where `base` is set to 3 and `cap` is set to 30. - -1st retry: The lower bound will be 3, the upper bound will be 3 * 2 = 6. So the waiting time will be a random number between (3, 6). +For example: -2nd retry: the lower bound will be 3, the upper bound will be 3 * (2 * 2) = 12. So the waiting time will be a random number between (3, 12). +When `base` is set to 3 and `cap` is set to 30: -3rd retry: the lower bound will be 3, the upper bound will be 3 * (2 * 2 * 2) = 24. So the waiting time will be a random number between (3, 24). +First retry: The lower bound will be 3. The upper bound will be `3 * 2 = 6`. +The waiting time will be a random number between (3, 6). -4th retry: the lower bound will be 3, since 3 * (2 * 2 * 2 * 2) = 48 > 30, the upper bound will be 30. So the waiting time will be a random number between (3, 30). +Second retry: The lower bound will be 3. The upper bound will be `3 * (2 * 2) = 12`. +The waiting time will be a random number between (3, 12). -Basically, the **scheduler.base** determines the lower bound of time between each retry and the **scheduler.cap** determines the upper bound. +Third retry: The lower bound will be 3. The upper bound will be `3 * (2 * 2 * 2) =24`. +The waiting time will be a random number between (3, 24). -For a detailed explanation of the exponential backoff and jitter algorithm, please check this [blog](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/). +Fourth retry: The lower bound will be 3, because `3 * (2 * 2 * 2 * 2) = 48` > `30`. +The upper bound will be 30. The waiting time will be a random number between (3, 30). -### Example +### Wait time example -The following example configures the **scheduler.base** as 3 seconds and **scheduler.cap** as 30 seconds. +The following example configures the `scheduler.base` as `3` seconds and +`scheduler.cap` as `30` seconds. ```text [SERVICE] @@ -64,26 +76,29 @@ The following example configures the **scheduler.base** as 3 seconds and **sched The waiting time will be: -| Nth retry | waiting time range (seconds) | -| --- | --- | +| Nth retry | Waiting time range (seconds) | +| --- | --- | | 1 | (3, 6) | | 2 | (3, 12) | | 3 | (3, 24) | | 4 | (3, 30) | -## Configuring Retries +## Configure retries -The Scheduler provides a simple configuration option called **Retry\_Limit**, which can be set independently on each output section. This option allows us to disable retries or impose a limit to try N times and then discard the data after reaching that limit: +The scheduler provides a configuration option called `Retry_Limit`, which can be set +independently on each output section. This option lets you disable retries or +impose a limit to try N times and then discard the data after reaching that limit: | | Value | Description | | :--- | :--- | :--- | -| Retry\_Limit | N | Integer value to set the maximum number of retries allowed. N must be >= 1 \(default: 1\) | -| Retry\_Limit | `no_limits` or `False` | When Retry\_Limit is set to `no_limits` or`False`, means that there is not limit for the number of retries that the Scheduler can do. | -| Retry\_Limit | no\_retries | When Retry\_Limit is set to no\_retries, means that retries are disabled and Scheduler would not try to send data to the destination if it failed the first time. | +| `Retry_Limit` | N | Integer value to set the maximum number of retries allowed. N must be >= 1 (default: `1`) | +| `Retry_Limit` | `no_limits` or `False` | When set there no limit for the number of retries that the scheduler can do. | +| `Retry_Limit` | `no_retries` | When set, retries are disabled and scheduler doesn't try to send data to the destination if it failed the first time. | -### Example +### Retry example -The following example configures two outputs where the HTTP plugin has an unlimited number of while the Elasticsearch plugin have a limit of 5 retries: +The following example configures two outputs where the HTTP plugin has an unlimited +number of while the Elasticsearch plugin have a limit of `5` retries: ```text [OUTPUT] @@ -99,4 +114,3 @@ The following example configures two outputs where the HTTP plugin has an unlimi Logstash_Format On Retry_Limit 5 ``` - From 8e7a081ab0df90968f5bf8bd7c9543fba27e0627 Mon Sep 17 00:00:00 2001 From: esmerel <6818907+esmerel@users.noreply.github.com> Date: Fri, 8 Nov 2024 09:43:14 -0800 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Craig Norris <112565517+cnorris-cs@users.noreply.github.com> Signed-off-by: esmerel <6818907+esmerel@users.noreply.github.com> --- administration/scheduling-and-retries.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/administration/scheduling-and-retries.md b/administration/scheduling-and-retries.md index 4cabf1e8b..ec8d7e88f 100644 --- a/administration/scheduling-and-retries.md +++ b/administration/scheduling-and-retries.md @@ -5,7 +5,7 @@ [Fluent Bit](https://fluentbit.io) has an engine that helps to coordinate the data ingestion from input plugins. The engine calls the _scheduler_ to decide when it's time to flush the data through one or multiple output plugins. The scheduler flushes new data -at a fixed time of seconds and retries when asked. +at a fixed number of seconds, and retries when asked. When an output plugin gets called to flush some data, after processing that data it can notify the engine using these possible return statuses: @@ -17,14 +17,14 @@ can notify the engine using these possible return statuses: ## Configure wait time for retry -The scheduler provides two configuration options called `scheduler.cap` and -`scheduler.base` which can be set in the Service section. These determine the waiting +The scheduler provides two configuration options, called `scheduler.cap` and +`scheduler.base`, which can be set in the Service section. These determine the waiting time before a retry happens. | Key | Description | Default | | --- | ------------| --------------| -| `scheduler.cap` | Set a maximum retry time in seconds. Supported in v1.8.7 or greater. | `2000` | -| `scheduler.base` | Set a base of exponential backoff. Supported in v1.8.7 or greater. | `5` | +| `scheduler.cap` | Set a maximum retry time in seconds. Supported in v1.8.7 or later. | `2000` | +| `scheduler.base` | Set a base of exponential backoff. Supported in v1.8.7 or later. | `5` | The `scheduler.base` determines the lower bound of time and the `scheduler.cap` determines the upper bound for each retry. @@ -97,8 +97,8 @@ impose a limit to try N times and then discard the data after reaching that limi ### Retry example -The following example configures two outputs where the HTTP plugin has an unlimited -number of while the Elasticsearch plugin have a limit of `5` retries: +The following example configures two outputs, where the HTTP plugin has an unlimited +number of retries, and the Elasticsearch plugin have a limit of `5` retries: ```text [OUTPUT] From 6a928747f89337eb60889c4e5ab4e1cfdaf219ad Mon Sep 17 00:00:00 2001 From: esmerel <6818907+esmerel@users.noreply.github.com> Date: Fri, 8 Nov 2024 09:43:29 -0800 Subject: [PATCH 3/3] Update administration/scheduling-and-retries.md Co-authored-by: Craig Norris <112565517+cnorris-cs@users.noreply.github.com> Signed-off-by: esmerel <6818907+esmerel@users.noreply.github.com> --- administration/scheduling-and-retries.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/administration/scheduling-and-retries.md b/administration/scheduling-and-retries.md index ec8d7e88f..d5d7496b1 100644 --- a/administration/scheduling-and-retries.md +++ b/administration/scheduling-and-retries.md @@ -86,7 +86,7 @@ The waiting time will be: ## Configure retries The scheduler provides a configuration option called `Retry_Limit`, which can be set -independently on each output section. This option lets you disable retries or +independently for each output section. This option lets you disable retries or impose a limit to try N times and then discard the data after reaching that limit: | | Value | Description |