The Amazon S3 output plugin allows to ingest your records into the S3 cloud object store.
The plugin can upload data to S3 using the multipart upload API or using S3 PutObject. Multipart is the default and is recommended; Fluent Bit will stream data in a series of 'parts'. This limits the amount of data it has to buffer on disk at any point in time. By default, every time 5 MiB of data have been received, a new 'part' will be uploaded. The plugin can create files up to gigabytes in size from many small chunks/parts using the multipart API. All aspects of the upload process are configurable using the configuration options.
The plugin allows you to specify a maximum file size, and a timeout for uploads. A file will be created in S3 when the max size is reached, or the timeout is reached- whichever comes first.
Records are stored in files in S3 as newline delimited JSON.
Key | Description | Default |
---|---|---|
region | The AWS region of you S3 bucket | us-east-1 |
bucket | S3 Bucket name | None |
json_date_format | Specifies the format of the date. Supported formats are double, iso8601 and epoch. | iso8601 |
total_file_size | Specifies the size of files in S3. Maximum size is 50G, minimum is 1M. | 100M |
upload_chunk_size | The size of each 'part' for multipart uploads. Max: 50M | 5,242,880 bytes |
upload_timeout | Whenever this amount of time has elapsed, Fluent Bit will complete an upload and create a new file in S3. For example, set this value to 60m and you will get a new file every hour. | 10m |
store_dir | Directory to locally buffer data before sending. When multipart uploads are used, data will only be buffered until the upload_chunk_size is reached. |
/tmp/fluent-bit/s3 |
s3_key_format | Format string for keys in S3. This option supports strftime time formatters and a syntax for selecting parts of the Fluent log tag using a syntax inspired by the rewrite_tag filter. Add $TAG in the format string to insert the full log tag; add $TAG[0] to insert the first part of the tag in the s3 key. The tag is split into “parts” using the characters specified with the s3_key_format_tag_delimiters option. See the in depth examples and tutorial in the documentation. |
/fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S |
s3_key_format_tag_delimiters | A series of characters which will be used to split the tag into 'parts' for use with the s3_key_format option. See the in depth examples and tutorial in the documentation. | . |
use_put_object | Use the S3 PutObject API, instead of the multipart upload API. | false |
role_arn | ARN of an IAM role to assume (ex. for cross account access). | None |
endpoint | Custom endpoint for the S3 API. | None |
sts_endpoint | Custom endpoint for the STS API. | None |
In Fluent Bit, all logs have an associated tag. The s3_key_format
option lets you inject the tag into the s3 key using the following syntax:
$TAG
=> the full tag$TAG[n]
=> the nth part of the tag (index starting at zero). This syntax is copied from the rewrite tag filter. By default, “parts” of the tag are separated with dots, but you can change this withs3_key_format_tag_delimiters
.
In the example below, assume the date is January 1st, 2020 00:00:00 and the tag associated with the logs in question is my_app_name-logs.prod
.
[OUTPUT]
Name s3
Match *
bucket my-bucket
region us-west-2
total_file_size 250M
s3_key_format $TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S
s3_key_format_tag_delimiters .-_
With the delimiters as . and -, the tag will be split into parts as follows:
$TAG[0]
= my_app_name$TAG[1]
= logs$TAG[2]
= prod
So the key in S3 will be prod/my_app_name/2020/01/01/00/00/00
.
The store_dir
is used to temporarily store data before it is uploaded. If Fluent Bit is stopped suddenly it will try to send all data and complete all uploads before it shuts down. If it can not send some data, on restart it will look in the store_dir
for existing data and will try to send it.
Multipart uploads are ideal for most use cases because they allow the plugin to upload data in small chunks over time. For example, 1 GB file can be created from 200 5MB chunks. While the file size in S3 will be 1 GB, only 5 MB will be buffered on disk at any one point in time.
There is one minor drawback to multipart uploads- the file and data will not be visible in S3 until the upload is completed with a CompleteMultipartUpload call. The plugin will attempt to make this call whenever Fluent Bit is shut down to ensure your data is available in s3. It will also store metadata about each upload in the store_dir
, ensuring that uploads can be completed when Fluent Bit restarts (assuming it has access to persistent disk and the store_dir
files will still be present on restart).
If you run Fluent Bit in an environment without persistent disk, or without the ability to restart Fluent Bit and give it access to the data stored in the store_dir
from previous executions- some considerations apply. This might occur if you run Fluent Bit on AWS Fargate.
In these situations, we recommend using the PutObject API, and sending data frequently, to avoid local buffering as much as possible. This will limit data loss in the event Fluent Bit is killed unexpectedly.
The following settings are recommended for this use case:
[OUTPUT]
Name s3
Match *
bucket your-bucket
region us-east-1
total_file_size 1M
upload_timeout 1m
use_put_object On
In order to send records into Amazon S3, you can run the plugin from the command line or through the configuration file.
The s3 plugin, can read the parameters from the command line through the -p argument (property), e.g:
$ fluent-bit -i cpu -o s3 -p bucket=my-bucket -p region=us-west-2 -p -m '*' -f 1
In your main configuration file append the following Output section:
[OUTPUT]
Name s3
Match *
bucket your-bucket
region us-east-1
store_dir /home/ec2-user/buffer
total_file_size 50M
upload_timeout 10m
An example that using PutObject instead of multipart:
[OUTPUT]
Name s3
Match *
bucket your-bucket
region us-east-1
store_dir /home/ec2-user/buffer
use_put_object On
total_file_size 10M
upload_timeout 10m
Amazon distributes a container image with Fluent Bit and this plugins.
github.com/aws/aws-for-fluent-bit
You can use our SSM Public Parameters to find the Amazon ECR image URI in your region:
aws ssm get-parameters-by-path --path /aws/service/aws-for-fluent-bit/
For more see the AWS for Fluent Bit github repo.