From 353e701a29eb415b771b8205657d1e6945043438 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 8 Dec 2023 19:12:51 +0900 Subject: [PATCH] http_server: v2: Add v2 endpoint descriptions (#1193) Signed-off-by: Hiroshi Hatake --- administration/monitoring.md | 57 ++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/administration/monitoring.md b/administration/monitoring.md index a07824da1..da6897b67 100644 --- a/administration/monitoring.md +++ b/administration/monitoring.md @@ -92,9 +92,14 @@ Fluent Bit aims to expose useful interfaces for monitoring, as of Fluent Bit v0. | /api/v1/metrics/prometheus | Internal metrics per loaded plugin ready to be consumed by a Prometheus Server | Prometheus Text 0.0.4 | | /api/v1/storage | Get internal metrics of the storage layer / buffered data. This option is enabled only if in the `SERVICE` section the property `storage.metrics` has been enabled | JSON | | /api/v1/health | Fluent Bit health check result | String | +| /api/v2/metrics | Internal metrics per loaded plugin | cmetrics' text format | +| /api/v2/metrics/prometheus | Internal metrics per loaded plugin ready to be consumed by a Prometheus Server | Prometheus Text 0.0.4 | +| /api/v2/reload | Execute hot reloading or get the status of hot reloading | JSON | ### Metric Descriptions +#### For v1 metrics + The following are detailed descriptions for the metrics outputted in prometheus format by `/api/v1/metrics/prometheus`. The following definitions are key to understand: @@ -137,6 +142,58 @@ The following are detailed descriptions for the metrics outputted in JSON format | input_chunks.{plugin name}.chunks.busy | "Busy" chunks are chunks that are being processed/sent by outputs and are not eligible to have new data appended. | chunks | | input_chunks.{plugin name}.chunks.busy_size | The sum of the byte size of each chunk which is currently marked as busy. | bytes | +#### For v2 metrics + +The following are detailed descriptions for the metrics outputted in prometheus format by `/api/v2/metrics/prometheus` or `/api/v2/metrics`. + +The following definitions are key to understand: +* record: a single message collected from a source, such as a single long line in a file. +* chunk: Fluent Bit input plugin instances ingest log records and store them in chunks. A batch of records in a chunk are tracked together as a single unit; the Fluent Bit engine attempts to fit records into chunks of at most 2 MB, but the size can vary at runtime. Chunks are then sent to an output. An output plugin instance can either successfully send the full chunk to the destination and mark it as successful, or it can fail the chunk entirely if an unrecoverable error is encountered, or it can ask for the chunk to be retried. + +| Metric Name | Labels | Description | Type | Unit | +|--------------------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| +| fluentbit\_input\_bytes\_total | name: the name or alias for the input instance | The number of bytes of log records that this input instance has successfully ingested | counter | bytes | +| fluentbit\_input\_records\_total | name: the name or alias for the input instance | The number of log records this input has successfully ingested | counter | records | +| fluentbit\_filter\_bytes\_total | name: the name or alias for the filter instance | The number of bytes of log records that this filter instance has successfully ingested | counter | bytes | +| fluentbit\_filter\_records\_total | name: the name or alias for the filter instance | The number of log records this filter has successfully ingested | counter | records | +| fluentbit\_filter\_added\_records\_total | name: the name or alias for the filter instance | The number of log records that have been added by the filter. This means they added into the data pipeline. | counter | records | +| fluentbit\_filter\_dropped\_records\_total | name: the name or alias for the filter instance | The number of log records that have been dropped by the filter. This means they removed from the data pipeline. | counter | records | +| fluentbit\_output\_dropped\_records\_total | name: the name or alias for the output instance | The number of log records that have been dropped by the output. This means they met an unrecoverable error or retries expired for their chunk. | counter | records | +| fluentbit\_output\_errors\_total | name: the name or alias for the output instance | The number of chunks that have faced an error (either unrecoverable or retriable). This is the number of times a chunk has failed, and does not correspond with the number of error messages you see in the Fluent Bit log output. | counter | chunks | +| fluentbit\_output\_proc\_bytes\_total | name: the name or alias for the output instance | The number of bytes of log records that this output instance has *successfully* sent. This is the total byte size of all unique chunks sent by this output. If a record is not sent due to some error, then it will not count towards this metric. | counter | bytes | +| fluentbit\_output\_proc\_records\_total | name: the name or alias for the output instance | The number of log records that this output instance has *successfully* sent. This is the total record count of all unique chunks sent by this output. If a record is not successfully sent, it does not count towards this metric. | counter | records | +| fluentbit\_output\_retried\_records\_total | name: the name or alias for the output instance | The number of log records that experienced a retry. Note that this is calculated at the chunk level, the count increased when an entire chunk is marked for retry. An output plugin may or may not perform multiple actions that generate many error messages when uploading a single chunk. | counter | records | +| fluentbit\_output\_retries\_failed\_total | name: the name or alias for the output instance | The number of times that retries expired for a chunk. Each plugin configures a Retry\_Limit which applies to chunks. Once the Retry\_Limit has been reached for a chunk it is discarded and this metric is incremented. | counter | chunks | +| fluentbit\_output\_retries\_total | name: the name or alias for the output instance | The number of times this output instance requested a retry for a chunk. | counter | chunks | +| fluentbit\_uptime | hostname: the hostname on running fluent-bit | The number of seconds that Fluent Bit has been running. | counter | seconds | +| fluentbit\_process\_start\_time\_seconds | hostname: the hostname on running fluent-bit | The Unix Epoch time stamp for when Fluent Bit started. | gauge | seconds | +| fluentbit\_build\_info | hostname: the hostname, version: the version of fluent-bit, os: OS type | Build version information. The returned value is originated from initializing the Unix Epoch time stamp of config context. | gauge | seconds | +| fluentbit\_hot\_reloaded\_times | hostname: the hostname on running fluent-bit | Collect the count of hot reloaded times. | gauge | seconds | + +The following are detailed descriptions for the metrics which is collected by storage layer. + + +| Metric Name | Labels | Description | Type | Unit | +|-------------------------------------------------|-------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| +| fluentbit\_input\_chunks.storage\_chunks | None | The total number of chunks of records that Fluent Bit is currently buffering | gauge | chunks | +| fluentbit\_storage\_mem\_chunk | None | The total number of chunks that are buffered in memory at this time. Note that chunks can be both in memory and on the file system at the same time. | gauge | chunks | +| fluentbit\_storage\_fs\_chunks | None | The total number of chunks saved to the filesystem. | gauge | chunks | +| fluentbit\_storage\_fs\_chunks\_up | None | A chunk is "up" if it is in memory. So this is the count of chunks that are both in filesystem and in memory. | gauge | chunks | +| fluentbit\_storage\_fs\_chunks\_down | None | The count of chunks that are "down" and thus are only in the filesystem. | gauge | chunks | +| fluentbit\_storage\_fs\_chunks\_busy | None | The total number of chunks are in a busy state. | gauge | chunks | +| fluentbit\_storage\_fs\_chunks\_busy\_bytes | None | The total bytes of chunks are in a busy state. | gauge | bytes | +| | | | | | +| fluentbit\_input\_storage\_overlimit | name: the name or alias for the input instance | Is this input instance over its configured Mem\_Buf\_Limit? | gauge | boolean | +| fluentbit\_input\_storage\_memory\_bytes | name: the name or alias for the input instance | The size of memory that this input is consuming to buffer logs in chunks. | gauge | bytes | +| | | | | | +| fluentbit\_input\_storage\_chunks | name: the name or alias for the input instance | The current total number of chunks owned by this input instance. | gauge | chunks | +| fluentbit\_input\_storage\_chunks\_up | name: the name or alias for the input instance | The current number of chunks that are "up" in memory for this input. Chunks that are "up" will also be in the filesystem layer as well if filesystem storage is enabled. | gauge | chunks | +| fluentbit\_input\_storage\_chunks\_down | name: the name or alias for the input instance | The current number of chunks that are "down" in the filesystem for this input. | gauge | chunks | +| fluentbit\_input\_storage\_chunks\_busy | name: the name or alias for the input instance | "Busy" chunks are chunks that are being processed/sent by outputs and are not eligible to have new data appended. | gauge | chunks | +| fluentbit\_input\_storage\_chunks\_busy\_bytes | name: the name or alias for the input instance | The sum of the byte size of each chunk which is currently marked as busy. | gauge | bytes | +| | | | | | +| fluentbit\_output\_upstream\_total\_connections | name: the name or alias for the output instance | The sum of the connection count of each output plugins. | gauge | bytes | +| fluentbit\_output\_upstream\_busy\_connections | name: the name or alias for the output instance | The sum of the connection count in a busy state of each output plugins. | gauge | bytes | ### Uptime Example