PostgreSQL is a very popular and versatile open source database management system that supports the SQL language and that is capable of storing both structured and unstructured data, such as JSON objects.
Given that Fluent Bit is designed to work with JSON objects, the pgsql
output plugin allows users to send their data to a PostgreSQL database and store it using the JSONB
type.
PostgreSQL 9.4 or higher is required.
According to the parameters you have set in the configuration file, the plugin will create the table defined by the table
option in the database defined by the database
option hosted on the server defined by the host
option. It will use the PostgreSQL user defined by the user
option, which needs to have the right privileges to create such a table in that database.
NOTE: If you are not familiar with how PostgreSQL's users and grants system works, you might find useful reading the recommended links in the "References" section at the bottom.
A typical installation normally consists of a self-contained database for Fluent Bit in which you can store the output of one or more pipelines. Ultimately, it is your choice to to store them in the same table, or in separate tables, or even in separate databases based on several factors, including workload, scalability, data protection and security.
In this example, for the sake of simplicity, we use a single table called fluentbit
in a database called fluentbit
that is owned by the user fluentbit
. Feel free to use different names. Preferably, for security reasons, do not use the postgres
user (which has SUPERUSER
privileges).
Generate a robust random password (e.g. pwgen 20 1
) and store it safely. Then, as postgres
system user on the server where PostgreSQL is installed, execute:
createuser -P fluentbit
At the prompt, please provide the password that you previously generated.
As a result, the user fluentbit
without superuser privileges will be created.
If you prefer, instead of the createuser
application, you can directly use the SQL command CREATE USER
.
As postgres
system user, please run:
createdb -O fluentbit fluentbit
This will create a database called fluentbit
owned by the fluentbit
user. As a result, the fluentbit
user will be able to safely create the data table.
Alternatively, you can use the SQL command CREATE DATABASE
.
Make sure that the fluentbit
user can connect to the fluentbit
database on the specified target host. This might require you to properly configure the pg_hba.conf
file.
Key | Description | Default |
---|---|---|
Host |
Hostname/IP address of the PostgreSQL instance | - (127.0.0.1) |
Port |
PostgreSQL port | - (5432) |
User |
PostgreSQL username | - (current user) |
Password |
Password of PostgreSQL username | - |
Database |
Database name to connect to | - (current user) |
Table |
Table name where to store data | - |
Connection_Options |
Specifies any valid PostgreSQL connection options | - |
Timestamp_Key |
Key in the JSON object containing the record timestamp | date |
Async |
Define if we will use async or sync connections | false |
min_pool_size |
Minimum number of connection in async mode | 1 |
max_pool_size |
Maximum amount of connections in async mode | 4 |
cockroachdb |
Set to true if you will connect the plugin with a CockroachDB |
false |
Fluent Bit relies on libpq, the PostgreSQL native client API, written in C language. For this reason, default values might be affected by environment variables and compilation settings. The above table, in brackets, list the most common default values for each connection option.
For security reasons, it is advised to follow the directives included in the password file section.
In your main configuration file add the following section:
[OUTPUT]
Name pgsql
Match *
Host 172.17.0.2
Port 5432
User fluentbit
Password YourCrazySecurePassword
Database fluentbit
Table fluentbit
Connection_Options -c statement_timeout=0
Timestamp_Key ts
The output plugin automatically creates a table with the name specified by the table
configuration option and made up of the following fields:
tag TEXT
time TIMESTAMP WITHOUT TIMEZONE
data JSONB
As you can see, the timestamp does not contain any information about the time zone and it is therefore referred to the time zone used by the connection to PostgreSQL (timezone
setting).
For more information on the JSONB
data type in PostgreSQL, please refer to the JSON types page in the official documentation, where you can find instructions on how to index or query the objects (including jsonpath
introduced in PostgreSQL 12).
PostgreSQL 10 introduces support for declarative partitioning. In order to improve vertical scalability of the database, you can decide to partition your tables on time ranges (for example on a monthly basis). PostgreSQL supports also subpartitions, allowing you to even partition by hash your records (version 11+), and default partitions (version 11+).
For more information on horizontal partitioning in PostgreSQL, please refer to the Table partitioning page in the official documentation.
If you are starting now, our recommendation at the moment is to choose the latest major version of PostgreSQL.
PostgreSQL is a really powerful and extensible database engine. More expert users can indeed take advantage of BEFORE INSERT
triggers on the main table and re-route records on normalised tables, depending on tags and content of the actual JSON objects.
For example, you can use Fluent Bit to send HTTP log records to the landing table defined in the configuration file. This table contains a BEFORE INSERT
trigger (a function in plpgsql
language) that normalises the content of the JSON object and that inserts the record in another table (with its own structure and partitioning model). This kind of triggers allow you to discard the record from the landing table by returning NULL
.
Here follows a list of useful resources from the PostgreSQL documentation: