administration: config: YAML: document all sections

Signed-off-by: Eduardo Silva <[email protected]>
fluent · Nov 11, 2024 · dcd5b11 · dcd5b11
1 parent 94a1c26
commit dcd5b11
Show file tree

Hide file tree

Showing 11 changed files with 492 additions and 10 deletions.
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -49,15 +49,22 @@
 ## Administration
 
 * [Configuring Fluent Bit](administration/configuring-fluent-bit/README.md)
+  * [YAML Configuration Sections](administration/configuring-fluent-bit/yaml/README.md)
+    * [Service](administration/configuring-fluent-bit/yaml/service-section.md)
+    * [Parsers](administration/configuring-fluent-bit/yaml/parsers-section.md)
+    * [Multiline Parsers](administration/configuring-fluent-bit/yaml/multiline-parsers-section.md)
+    * [Pipeline](administration/configuring-fluent-bit/yaml/pipeline-section.md)
+    * [Environment Variables](administration/configuring-fluent-bit/yaml/environment-variables-section.md)
+    * [Includes](administration/configuring-fluent-bit/yaml/includes-section.md)
+
+    * [Configuration File](administration/configuring-fluent-bit/yaml/configuration-file.md)
   * [Classic mode](administration/configuring-fluent-bit/classic-mode/README.md)
     * [Format and Schema](administration/configuring-fluent-bit/classic-mode/format-schema.md)
     * [Configuration File](administration/configuring-fluent-bit/classic-mode/configuration-file.md)
     * [Variables](administration/configuring-fluent-bit/classic-mode/variables.md)
     * [Commands](administration/configuring-fluent-bit/classic-mode/commands.md)
     * [Upstream Servers](administration/configuring-fluent-bit/classic-mode/upstream-servers.md)
     * [Record Accessor](administration/configuring-fluent-bit/classic-mode/record-accessor.md)
-  * [YAML Configuration](administration/configuring-fluent-bit/yaml/README.md)
-    * [Configuration File](administration/configuring-fluent-bit/yaml/configuration-file.md)
   * [Unit Sizes](administration/configuring-fluent-bit/unit-sizes.md)
   * [Multiline Parsing](administration/configuring-fluent-bit/multiline-parsing.md)
 * [Transport Security](administration/transport-security.md)

diff --git a/administration/configuring-fluent-bit/README.md b/administration/configuring-fluent-bit/README.md
@@ -1,14 +1,13 @@
 # Configuring Fluent Bit
 
-Fluent Bit supports these configuration formats:
+Currently, Fluent Bit supports two configuration formats:
 
-- [Classic mode](classic-mode/README.md)
-- [YAML](yaml/README.md) (Fluent Bit 2.0 or greater)
+* [Yaml](yaml/README.md): standard configuration format as of v3.2.
+* [Classic mode](classic-mode/README.md): to be deprecated at the end of 2025.
 
-## CLI flags
+## Command line interface
 
-Fluent Bit also supports a CLI with various flags for the available configuration
-options.
+Fluent Bit exposes most of it features through the command line interface. Running the `-h` option you can get a list of the options available:
 
 ```shell
 $ docker run --rm -it fluent/fluent-bit --help

diff --git a/administration/configuring-fluent-bit/yaml/README.md b/administration/configuring-fluent-bit/yaml/README.md
@@ -1,3 +1,44 @@
-# Fluent Bit YAML configuration
+# Fluent Bit YAML Configuration
 
-YAML configuration feature was introduced since FLuent Bit version 1.9 as experimental, and it is production ready since Fluent Bit 2.0.
+## Before You Get Started
+
+Fluent Bit traditionally offered a `classic` configuration mode, a custom configuration format that we are gradually phasing out. While `classic` mode has served well for many years, it has several limitations. Its basic design only supports grouping sections with key-value pairs and lacks the ability to handle sub-sections or complex data structures like lists.
+
+YAML, now a mainstream configuration format, has become essential in a cloud ecosystem where everything is configured this way. To minimize friction and provide a more intuitive experience for creating data pipelines, we strongly encourage users to transition to YAML. The YAML format enables features, such as processors, that are not possible to configure in `classic` mode.
+
+As of Fluent Bit v3.2, you can configure everything in YAML.
+
+## List of Available Sections
+
+Configuring Fluent Bit with YAML introduces the following root-level sections:
+
+| Section Name         |Description                                                                                                                                           |
+|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `service`            | Describes the global configuration for the Fluent Bit service. This section is optional; if not set, default values will apply. Only one `service` section can be defined. |
+| `parsers`            | Lists parsers to be used by components like inputs, processors, filters, or output plugins. You can define multiple `parsers` sections, which can also be loaded from external files included in the main YAML configuration. |
+| `multiline_parsers`  | Lists multiline parsers, functioning similarly to `parsers`. Multiple definitions can exist either in the root or in included files.                    |
+| `pipeline`           | Defines a pipeline composed of inputs, processors, filters, and output plugins. You can define multiple `pipeline` sections, but they will not operate independently. Instead, all components will be merged into a single pipeline internally. |
+| `plugins`            | Specifies the path to external plugins (.so files) to be loaded by Fluent Bit at runtime.                                                              |
+| `upstream_servers`          | Refers to a group of node endpoints that can be referenced by output plugins that support this feature.                                                |
+| `env`                | Sets a list of environment variables for Fluent Bit. Note that system environment variables are available, while the ones defined in the configuration apply only to Fluent Bit. |
+
+## Section Documentation
+
+To access detailed configuration guides for each section, use the following links:
+
+- [Service Section documentation](service-section.md)
+  - Overview of global settings, configuration options, and examples.
+- [Parsers Section documentation](parsers-section.md)
+  - Detailed guide on defining parsers and supported formats.
+- [Multiline Parsers Section documentation](multiline-parsers-section.md)
+  - Explanation of multiline parsing configuration.
+- [Pipeline Section documentation](pipeline-section.md)
+  - Details on setting up pipelines and using processors.
+- [Plugins Section documentation](plugins-section.md)
+  - How to load external plugins.
+- [Upstreams Section documentation](upstream-servers-section.md)
+  - Guide on setting up and using upstream nodes with supported plugins.
+- [Environment Variables Section documentation](environment-variables-section.md)
+  - Information on setting environment variables and their scope within Fluent Bit.
+- [Includes Section documentation](includes-section.md)
+  - Description on how to include external YAML files.
diff --git a/administration/configuring-fluent-bit/yaml/environment-variables-section.md b/administration/configuring-fluent-bit/yaml/environment-variables-section.md
@@ -0,0 +1,61 @@
+# Environment Variables Section
+
+The `env` section allows you to define environment variables directly within the configuration file. These variables can then be used to dynamically replace values throughout your configuration using the `${VARIABLE_NAME}` syntax.
+
+Values set in the `env` section are case-sensitive. However, as a best practice, we recommend using uppercase names for environment variables. The example below defines two variables, `FLUSH_INTERVAL` and `STDOUT_FMT`, which can be accessed in the configuration using `${FLUSH_INTERVAL}` and `${STDOUT_FMT}`:
+
+```yaml
+env:
+  FLUSH_INTERVAL: 1
+  STDOUT_FMT: 'json_lines'
+
+service:
+  flush: ${FLUSH_INTERVAL}
+  log_level: info
+
+pipeline:
+  inputs:
+    - name: random
+
+  outputs:
+    - name: stdout
+      match: '*'
+      format: ${STDOUT_FMT}
+```
+
+## Predefined Variables
+
+Fluent Bit provides a set of predefined environment variables that can be used in your configuration:
+
+| Name | Description |
+|--|--|
+| `${HOSTNAME}` | The system’s hostname. |
+
+## External Variables
+
+In addition to variables defined in the configuration file or the predefined ones, Fluent Bit can access system environment variables set in the user space. These external variables can be referenced in the configuration using the same ${VARIABLE_NAME} pattern.
+
+For example, to set the FLUSH_INTERVAL system environment variable to 2 and use it in your configuration:
+
+```bash
+export FLUSH_INTERVAL=2
+```
+
+In the configuration file, you can then access this value as follows:
+
+```yaml
+service:
+  flush: ${FLUSH_INTERVAL}
+  log_level: info
+
+pipeline:
+  inputs:
+    - name: random
+
+  outputs:
+    - name: stdout
+      match: '*'
+      format: json_lines
+```
+
+This approach allows you to easily manage and override configuration values using environment variables, providing flexibility in various deployment environments.
diff --git a/administration/configuring-fluent-bit/yaml/includes-section.md b/administration/configuring-fluent-bit/yaml/includes-section.md
@@ -0,0 +1,32 @@
+# Includes Section
+
+The `includes` section allows you to specify additional YAML configuration files to be merged into the current configuration. These files are identified as a list of filenames and can include relative or absolute paths. If no absolute path is provided, the file is assumed to be located in a directory relative to the file that references it.
+
+This feature is useful for organizing complex configurations into smaller, manageable files and including them as needed.
+
+### Usage
+
+Below is an example demonstrating how to include additional YAML files using relative path references. This is the file system path structure
+
+```
+├── fluent-bit.yaml
+├── inclusion-1.yaml
+└── subdir
+    └── inclusion-2.yaml
+```
+
+The content of `fluent-bit.yaml`
+
+```yaml
+includes:
+  - inclusion-1.yaml
+  - subdir/inclusion-2.yaml
+```
+
+## Key Points
+
+- Relative Paths: If a path is not specified as absolute, it will be treated as relative to the file that includes it.
+
+- Organized Configurations: Using the includes section helps keep your configuration modular and easier to maintain.
+
+> note: Ensure that the included files are formatted correctly and contain valid YAML configurations for seamless integration.
diff --git a/administration/configuring-fluent-bit/yaml/multiline-parsers-section.md b/administration/configuring-fluent-bit/yaml/multiline-parsers-section.md
@@ -0,0 +1,26 @@
+# Multiline Parsers
+
+Multiline parsers are used to combine logs that span multiple events into a single, cohesive message. This is particularly useful for handling stack traces, error logs, or any log entry that contains multiple lines of information.
+
+In YAML configuration, the syntax for defining multiline parsers differs slightly from the classic configuration format introducing minor breaking changes, specifically on how the rules are defined.
+
+Below is an example demonstrating how to define a multiline parser directly in the main configuration file, as well as how to include additional definitions from external files:
+
+```yaml
+multiline_parsers:
+  - name: multiline-regex-test
+    type: regex
+    flush_timeout: 1000
+    rules:
+      - state: start_state
+        regex: '/([a-zA-Z]+ \d+ \d+:\d+:\d+)(.*)/'
+        next_state: cont
+      - state: cont
+        regex: '/^\s+at.*/'
+        next_state: cont
+```
+
+The example above defines a multiline parser named `multiline-regex-test` that uses regular expressions to handle multi-event logs. The parser contains two rules: the first rule transitions from start_state to cont when a matching log entry is detected, and the second rule continues to match subsequent lines.
+
+For more detailed information on configuring multiline parsers, including advanced options and use cases, please refer to the Configuring Multiline Parsers section.
+
diff --git a/administration/configuring-fluent-bit/yaml/parsers-section.md b/administration/configuring-fluent-bit/yaml/parsers-section.md
@@ -0,0 +1,23 @@
+# Parsers Section
+
+Parsers enable Fluent Bit components to transform unstructured data into a structured internal representation. You can define parsers either directly in the main configuration file or in separate external files for better organization.
+
+This page provides a general overview of how to declare parsers.
+
+The main section name is `parsers`, and it allows you to define a list of parser configurations. The following example demonstrates how to set up two simple parsers:
+
+```yaml
+parsers:
+  - name: json
+    format: json
+
+  - name: docker
+    format: json
+    time_key: time
+    time_format: "%Y-%m-%dT%H:%M:%S.%L"
+    time_keep: true
+```
+
+You can define multiple parsers sections, either within the main configuration file or distributed across included files.
+
+For more detailed information on parser options and advanced configurations, please refer to the [Configuring Parsers]() section.
diff --git a/administration/configuring-fluent-bit/yaml/pipeline-section.md b/administration/configuring-fluent-bit/yaml/pipeline-section.md
@@ -0,0 +1,149 @@
+# Pipeline Section
+
+The `pipeline` section defines the flow of how data is collected, processed, and sent to its final destination. It encompasses the following core concepts:
+
+| Name | Description |
+|---|---|
+| `inputs` | Specifies the name of the plugin responsible for collecting or receiving data. This component serves as the data source in the pipeline. Examples of input plugins include `tail`, `http`, and `random`. |
+| `processors` | **Unique to YAML configuration**, processors are specialized plugins that handle data processing directly attached to input plugins. Unlike filters, processors are not dependent on tag or matching rules. Instead, they work closely with the input to modify or enrich the data before it reaches the filtering or output stages. Processors are defined within an input plugin section. |
+| `filters` | Filters are used to transform, enrich, or discard events based on specific criteria. They allow matching tags using strings or regular expressions, providing a more flexible way to manipulate data. Filters run as part of the main event loop and can be applied across multiple inputs and filters. Examples of filters include `modify`, `grep`, and `nest`. |
+| `outputs` | Defines the destination for processed data. Outputs specify where the data will be sent, such as to a remote server, a file, or another service. Each output plugin is configured with matching rules to determine which events are sent to that destination. Common output plugins include `stdout`, `elasticsearch`, and `kafka`. |
+
+## Example Configuration
+
+Here’s a simple example of a pipeline configuration:
+
+```yaml
+pipeline:
+  inputs:
+    - name: tail
+      path: /var/log/example.log
+      parser: json
+
+      processors:
+        logs:
+          - name: record_modifier
+  filters:
+    - name: grep
+      match: '*'
+      regex: key pattern
+
+  outputs:
+    - name: stdout
+      match: '*'
+```
+
+## Pipeline Processors
+
+Processors operate on specific signals such as logs, metrics, and traces. They are attached to an input plugin and must specify the signal type they will process.
+
+### Example of a Processor
+
+In the example below, the content_modifier processor inserts or updates (upserts) the key my_new_key with the value 123 for all log records generated by the tail plugin. This processor is only applied to log signals:
+
+```yaml
+parsers:
+  - name: json
+    format: json
+
+pipeline:
+  inputs:
+    - name: tail
+      path: /var/log/example.log
+      parser: json
+
+      processors:
+        logs:
+          - name: content_modifier
+            action: upsert
+            key: my_new_key
+            value: 123
+  filters:
+    - name: grep
+      match: '*'
+      regex: key pattern
+
+  outputs:
+    - name: stdout
+      match: '*'
+```
+
+Here is a more complete example with multiple processors:
+
+```yaml
+service:
+  log_level: info
+  http_server: on
+  http_listen: 0.0.0.0
+  http_port: 2021
+
+pipeline:
+  inputs:
+    - name: random
+      tag: test-tag
+      interval_sec: 1
+      processors:
+        logs:
+          - name: modify
+            add: hostname monox
+          - name: lua
+            call: append_tag
+            code: |
+              function append_tag(tag, timestamp, record)
+                 new_record = record
+                 new_record["tag"] = tag
+                 return 1, timestamp, new_record
+              end
+
+  outputs:
+    - name: stdout
+      match: '*'
+      processors:
+        logs:
+          - name: lua
+            call: add_field
+            code: |
+              function add_field(tag, timestamp, record)
+                 new_record = record
+                 new_record["output"] = "new data"
+                 return 1, timestamp, new_record
+              end
+```
+
+You might noticed that processors not only can be attached to input, but also to an output.
+
+### How Are Processors Different from Filters?
+
+While processors and filters are similar in that they can transform, enrich, or drop data from the pipeline, there is a significant difference in how they operate:
+
+- Processors: Run in the same thread as the input plugin when the input plugin is configured to be threaded (threaded: true). This design provides better performance, especially in multi-threaded setups.
+
+- Filters: Run in the main event loop. When multiple filters are used, they can introduce performance overhead, particularly under heavy workloads.
+
+## Running Filters as Processors
+
+You can configure existing [Filters](https://docs.fluentbit.io/manual/pipeline/filters) to run as processors. There are no specific changes needed; you simply use the filter name as if it were a native processor.
+
+### Example of a Filter Running as a Processor
+
+In the example below, the grep filter is used as a processor to filter log events based on a pattern:
+
+```yaml
+parsers:
+  - name: json
+    format: json
+
+pipeline:
+  inputs:
+    - name: tail
+      path: /var/log/example.log
+      parser: json
+
+      processors:
+        logs:
+          - name: grep
+            regex: log aa
+  outputs:
+    - name: stdout
+      match: '*'
+```