Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FLB_ERROR from flush to ensure there is never log loss #315

Merged
merged 3 commits into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 1.9.2
* Bug - Fixed Log Loss can occur when log group creation or retention policy API calls fail. (#314)

## 1.9.1
* Enhancement - Added different base user agent for Linux and Windows

Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ This plugin uses the AWS SDK Go, and uses its [default credential provider chain
* `FLB_LOG_LEVEL`: Set the log level for the plugin. Valid values are: `debug`, `info`, and `error` (case insensitive). Default is `info`. **Note**: Setting log level in the Fluent Bit Configuration file using the Service key will not affect the plugin log level (because the plugin is external).
* `SEND_FAILURE_TIMEOUT`: Allows you to configure a timeout if the plugin can not send logs to CloudWatch. The timeout is specified as a [Golang duration](https://golang.org/pkg/time/#ParseDuration), for example: `5m30s`. If the plugin has failed to make any progress for the given period of time, then it will exit and kill Fluent Bit. This is useful in scenarios where you want your logging solution to fail fast if it has been misconfigured (i.e. network or credentials have not been set up to allow it to send to CloudWatch).

### Retries and Buffering

Buffering and retries are managed by the Fluent Bit core engine, not by the plugin. Whenever the plugin encounters any error, it returns a retry to the engine which schedules a retry. This means that log group creation, log stream creation or log retention policy calls can consume a retry if they fail.

* [Fluent Bit upstream documentation on Retries](https://docs.fluentbit.io/manual/administration/scheduling-and-retries)
* [Fluent Bit upstream documentation on buffering](https://docs.fluentbit.io/manual/administration/buffering-and-storage)
* [FireLens OOMKill prevent example for buffering](https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention)

### Templating Log Group and Stream Names

A template in the form of `$(variable)` can be set in `log_group_name` or `log_stream_name`. `variable` can be a map key name in the log message. To access sub-values in the map use the form `$(variable['subkey'])`. Also, it can be replaced with special values to insert the tag, ECS metadata or a random string in the name.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.9.1
1.9.2
2 changes: 1 addition & 1 deletion cloudwatch/cloudwatch.go
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ func (output *OutputPlugin) AddEvent(e *Event) int {

if err := output.createLogGroup(e); err != nil {
logrus.Error(err)
return fluentbit.FLB_ERROR
return fluentbit.FLB_RETRY
}

output.groups[e.group] = struct{}{}
Expand Down
2 changes: 1 addition & 1 deletion fluent-bit-cloudwatch.go
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ func FLBPluginFlushCtx(ctx, data unsafe.Pointer, length C.int, tag *C.char) int
// Return options:
//
// output.FLB_OK = data have been processed.
// output.FLB_ERROR = unrecoverable error, do not try this again.
// output.FLB_ERROR = unrecoverable error, do not try this again. Never returned by flush.
// output.FLB_RETRY = retry to flush later.
return output.FLB_OK
}
Expand Down