troubleshooting-dashboard.json.template

{
  "name": "New Relic Fluent Bit output plugin troubleshooting",
  "description": null,
  "permissions": "PUBLIC_READ_WRITE",
  "pages": [
    {
      "name": "FB plugin instrumentation",
      "description": null,
      "widgets": [
        {
          "title": "",
          "layout": {
            "column": 1,
            "row": 1,
            "width": 4,
            "height": 9
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.markdown"
          },
          "rawConfiguration": {
            "text": "# What is this dashboard for\nThe purpose of this dashboard is to allow you to troubleshoot a malfunctioning Fluent Bit installation that uses the [New Relic Fluent Bit output plugin](https://github.com/newrelic/newrelic-fluent-bit-output).\n\nPlease note that the **metrics used by this dashboard must not be considered as a stable API: they can change its naming or dimensions at any time in newer plugin versions**. That is, **no critical alerts or long-term dashboard should be created out of them**.\n\nNote that **your plugin needs to be configured with `sendMetrics=true`** in order for the metrics used by this dashboard to be emitted.\n\n\n# Basic naming conventions\n- Fluent Bit aggregates logs in batches, also referred as **chunks**. Each chunk therefore contains an unknown amount of logs.\n- Chunks are received sequentially at the New Relic Fluent Bit output plugin, which takes care of reading the logs they contain and splitting them into the so-called New Relic *payloads*.\n- Each **payload** is a compressed stream of bytes that can be [at most 1MB long](https://docs.newrelic.com/docs/logs/log-api/introduction-log-api/#limits), and follows the [data format required by the Logs API](https://docs.newrelic.com/docs/logs/log-api/introduction-log-api/#json-content).\n\n\n# Error-detection graphs and recommended actions\n\nThe following are the main graphs used to detect potential problems in your log forwarding setup. Refer to each section to learn the recommended actions for each graph.\n\n## Payload packaging errors\nRepresents the percentage of Fluent Bit chunks that threw an error when they were attempted to be packaged as New Relic payloads. Such errors are never expected to happen. Therefore, **any value greater than 0% should be thoroughly investigated**.\n\nIf you find errors in this graph, please open a support ticket and include a  sample of your logs for further investigation.\n\n## Payload sending errors\nRepresents the percentage of New Relic payloads that threw an unexpected error when they were attempted to be sent to New Relic. Such errors can happen sporadically: timeouts due to poor network performance or sudden network changes can cause them from time to time. Observing **values greater than 0% can sometimes be normal, but any value above 10% should be considered as an annomalous situation and should be thoroughly investigated**.\n\nIf you find errors in this graph, please ensure that you don't have any weak spots in your network path to New Relic: are you using a proxy? Is it or any network hop introducing too much latency due to being saturated? If you can't find anything on you side, please open a support ticket and include as much information as possible from your network setup.\n\n## Payload send results\nRepresents the amount of API requests that were performed to send logs to New Relic. **Ideally, you should only observe 202 responses here**. Sometimes, intermediary CDN providers can introduce some errors (503 error codes) from time to time, in which case your logs will not be lost and reattempted to be sent.\n\nIf you find a considerable amount of non-202 responses in this graph, please open a customer support ticket.\n\n# Additional troubleshooting graphs\n\nThe following graphs include additional fine-grained information that will be useful for New Relic to troubleshoot your potential installation issues.\n\n## Average timings\nRepresents the average amount of time the plugin spent packaging the log payloads and sending them to New Relic, respectively.\n\n## Accumulated time per minute\nRepresents the amount of time per minute the plugin spent packaging the log payloads and sending them to New Relic, respectively.\n\n## Payload size\nRepresents the size in bytes of the individual compressed payloads sent to New Relic.\n\n## Payload packets per Fluent Bit chunk\nRepresents the amount of payloads sent to New Relic per each Fluent Bit chunk."
          }
        },
        {
          "title": "Payload packaging errors",
          "layout": {
            "column": 5,
            "row": 1,
            "width": 2,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.billboard"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "FROM Metric SELECT percentage(count(`logs.fb.packaging.time`), WHERE hasError = true) AS 'packaging errors'"
              }
            ],
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "thresholds": [
              {
                "alertSeverity": "CRITICAL",
                "value": 0
              }
            ]
          }
        },
        {
          "title": "Payload sending errors",
          "layout": {
            "column": 7,
            "row": 1,
            "width": 2,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.billboard"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "FROM Metric SELECT percentage(count(`logs.fb.payload.send.time`), WHERE hasError = true) AS 'send errors'"
              }
            ],
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "thresholds": [
              {
                "alertSeverity": "WARNING",
                "value": 0
              },
              {
                "alertSeverity": "CRITICAL",
                "value": 0.1
              }
            ]
          }
        },
        {
          "title": "Payload send results",
          "layout": {
            "column": 9,
            "row": 1,
            "width": 4,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.line"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "legend": {
              "enabled": true
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "SELECT rate(count(logs.fb.payload.send.time), 1 minute) AS 'Status Code' FROM Metric FACET CASES(WHERE statusCode = 0 AS 'Send error') OR statusCode timeseries max"
              }
            ],
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "units": {
              "unit": "REQUESTS_PER_MINUTE"
            },
            "yAxisLeft": {
              "zero": true
            },
            "yAxisRight": {
              "zero": true
            }
          }
        },
        {
          "title": "Average timings",
          "layout": {
            "column": 5,
            "row": 4,
            "width": 4,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.line"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "legend": {
              "enabled": true
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "SELECT average(logs.fb.payload.send.time) AS 'Payload sending', average(logs.fb.packaging.time) AS 'Payload packaging' FROM Metric timeseries max"
              }
            ],
            "nullValues": {
              "nullValue": "zero"
            },
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "units": {
              "unit": "MS"
            },
            "yAxisLeft": {
              "zero": true
            },
            "yAxisRight": {
              "zero": true
            }
          }
        },
        {
          "title": "Accumulated time per minute",
          "layout": {
            "column": 9,
            "row": 4,
            "width": 4,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.area"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "legend": {
              "enabled": true
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "SELECT rate(sum(logs.fb.total.send.time), 1 minute) AS 'Sending', rate(sum(logs.fb.packaging.time), 1 minute) AS 'Packaging' FROM Metric TIMESERIES max"
              }
            ],
            "nullValues": {
              "nullValue": "zero"
            },
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "units": {
              "unit": "MS"
            }
          }
        },
        {
          "title": "Payload size",
          "layout": {
            "column": 5,
            "row": 7,
            "width": 4,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.line"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "legend": {
              "enabled": true
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "SELECT min(logs.fb.payload.size) AS 'Minimum', average(logs.fb.payload.size) AS 'Average', max(logs.fb.payload.size) AS 'Maximum' FROM Metric timeseries MAX "
              }
            ],
            "nullValues": {
              "nullValue": "default"
            },
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "units": {
              "unit": "BYTES"
            },
            "yAxisLeft": {
              "zero": true
            },
            "yAxisRight": {
              "zero": true
            }
          }
        },
        {
          "title": "Payload packets per Fluent Bit chunk",
          "layout": {
            "column": 9,
            "row": 7,
            "width": 4,
            "height": 3
          },
          "linkedEntityGuids": null,
          "visualization": {
            "id": "viz.line"
          },
          "rawConfiguration": {
            "facet": {
              "showOtherSeries": false
            },
            "legend": {
              "enabled": true
            },
            "nrqlQueries": [
              {
                "accountIds": [
                  YOUR_ACCOUNT_ID
                ],
                "query": "SELECT min(logs.fb.payload.count) AS 'Minimum', average(logs.fb.payload.count) AS 'Average', max(logs.fb.payload.count) AS 'Maximum' FROM Metric timeseries MAX "
              }
            ],
            "platformOptions": {
              "ignoreTimeRange": false
            },
            "units": {
              "unit": "COUNT"
            },
            "yAxisLeft": {
              "zero": true
            },
            "yAxisRight": {
              "zero": true
            }
          }
        }
      ]
    }
  ],
  "variables": []
}