Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualizations in grafana end up hitting max buckets but the same visualization in kibana works fine #426

Open
rkarthikr opened this issue Jul 24, 2024 · 16 comments
Labels
datasource/OpenSearch type/bug Something isn't working

Comments

@rkarthikr
Copy link

rkarthikr commented Jul 24, 2024

What happened:
We've been working through the max bucket error in grafana with an opensearch datasource. Initially I thought the issue was with opensearch however we have bumped that max bucket limit to 65536 and we are still mostly seeing this error (some aggregations now work but most hit this limit and error). To compare I recreated the same simple visualization in kibana (or whatever the equivalent is called for opensearch) and I don't get any errors and it generates the visualization quickly. I suspect that the opensearch plugin is doing something differently than kibana that is causing it to hit this limit even with a high setting for the limit.

image

What you expected to happen:
visualizations to work without hitting max buckets

How to reproduce it (as minimally and precisely as possible):
Create a simple aggregation in grafana using an opensearch datasource
Anything else we need to know?:

Environment:

  • Grafana version: Grafana v11.2.0-73451
  • OpenSearch version: AWS 2.13
  • Plugin version: 2.17.1
@kevinwcyu
Copy link
Contributor

Hi @rkarthikr, I tried running the query as shown in your screenshot but wasn't able to reproduce an error. Can you show what is in the query object by clicking on the Query Inspector and clicking on the Query tab in the inspector. The query will be listed in data.queries.

@kevinwcyu kevinwcyu moved this from Incoming to Waiting in AWS Datasources Jul 26, 2024
@rkarthikr
Copy link
Author

{
  "traceId": "50384ed94095e8fe6eedfee4c020957a",
  "request": {
    "url": "api/ds/query?ds_type=grafana-opensearch-datasource&requestId=explore_o6v",
    "method": "POST",
    "data": {
      "queries": [
        {
          "refId": "A",
          "datasource": {
            "type": "grafana-opensearch-datasource",
            "uid": "ads67lnsevj0gd"
          },
          "query": "*",
          "queryType": "lucene",
          "alias": "",
          "metrics": [
            {
              "type": "count",
              "id": "1"
            }
          ],
          "bucketAggs": [
            {
              "type": "date_histogram",
              "id": "2",
              "settings": {
                "interval": "auto"
              },
              "field": "startTime"
            }
          ],
          "format": "table",
          "timeField": "startTime",
          "luceneQueryType": "Traces",
          "datasourceId": 12,
          "intervalMs": 60000,
          "maxDataPoints": 1515
        }
      ],
      "from": "1722177213354",
      "to": "1722180813354"
    },
    "hideFromInspector": false
  },
  "response": {
    "message": "An error occurred within the plugin",
    "messageId": "plugin.downstreamError",
    "statusCode": 500,
    "traceID": "50384ed94095e8fe6eedfee4c020957a"
  }
}

@kevinwcyu kevinwcyu moved this from Waiting to Incoming in AWS Datasources Jul 29, 2024
@rkarthikr
Copy link
Author

@kevinwcyu - Any updates on this ?

@iwysiu
Copy link
Contributor

iwysiu commented Aug 2, 2024

Hi @rkarthikr ! I've been investigating this. I haven't been able to reproduce it, but I have found some differences between the query that the opensearch dashboard runs and the one we create, and we'll continue to investigate why those differences exist and whether they affect performance.

@rkarthikr
Copy link
Author

I will reach out to you in Grafana Community Slack.

@idastambuk
Copy link
Contributor

Hi @rkarthikr! You mention that you're getting max_buckets for this query, but I only see the plugin.downstreamError error. How did you discover this is a max buckets error and not an error in the plugin code? Thanks!

@rkarthikr
Copy link
Author

Saw the error in the OpenSearch Logs. I tried increasing the max buckets config on OpenSearch end and i no longer get this error. But still get the plugin.downstreamError error with no additional details on error

Please let me know . happy to walk you through the demo env to see if you can use it to collect data for troubleshooting further

@idastambuk
Copy link
Contributor

Hi @rkarthikr,
it would be super helpful to get a step by step on how to set up a similar environment, since it seems like our backend might be running into errors with the data itself. Thanks a lot!

@rkarthikr
Copy link
Author

  1. Demo Application - https://github.com/open-telemetry/opentelemetry-demo/tree/main/kubernetes. Deployed the application listed here in to an EKS Cluster
  2. Updated OTEL Config to send traces to OpenSearch
  3. Setup OpenSearch Datasource in Grafana
  4. Using Data Source Explorer - Explore Trace data for > 5 min and see error

@rkarthikr
Copy link
Author

Please let me know if there is any way to enable Grafana logs that will help you to troubleshoot this further . I am using Grafana Cloud demo environment for this

@idastambuk idastambuk moved this from Waiting to Incoming in AWS Datasources Aug 26, 2024
@superstes
Copy link

superstes commented Aug 28, 2024

Did see the same error while trying to explore data for my new project (local docker setup). (opensearchproject/opensearch:2 & grafana/grafana:11.1.4)
Only a few 100 messages produced the max buckets error

@kevinwcyu
Copy link
Contributor

Hi @rkarthikr, Could you share the visualization from the OpenSearch Dashboard (Kibana) that works? With the demo application, I still haven't been able to get an error related to the max bucket limit, but do get the same error shown in the screenshot in the description when I perform a trace query.

I think the plugin.downstreamError error might potentially be fixed by #445, while we still have to try to figure out what is causing the max bucket error.

@kevinwcyu kevinwcyu moved this from Incoming to Waiting in AWS Datasources Aug 29, 2024
@yotamN
Copy link

yotamN commented Sep 21, 2024

Could it be the interval setting? I'm getting the same error sometimes (also with AWS OpenSearch) when setting the interval to auto but when I set it manually to a bigger number it works fine.
I can also see visually that the interval behavior is a bit different between Grafana and Kibana.

@iwysiu iwysiu moved this from Incoming to Waiting in AWS Datasources Sep 30, 2024
@iwysiu iwysiu moved this from Waiting to Incoming in AWS Datasources Oct 3, 2024
@kevinwcyu
Copy link
Contributor

Could it be the interval setting? I'm getting the same error sometimes (also with AWS OpenSearch) when setting the interval to auto but when I set it manually to a bigger number it works fine. I can also see visually that the interval behavior is a bit different between Grafana and Kibana.

Hi @yotamN, There isn't an option to set the interval for Traces queries so I just wanted to clarify whether you are running a Traces query as shown in the issue description or aMetric query?

@kevinwcyu kevinwcyu moved this from Incoming to Waiting in AWS Datasources Oct 7, 2024
@yotamN
Copy link

yotamN commented Oct 9, 2024

Could it be the interval setting? I'm getting the same error sometimes (also with AWS OpenSearch) when setting the interval to auto but when I set it manually to a bigger number it works fine. I can also see visually that the interval behavior is a bit different between Grafana and Kibana.

Hi @yotamN, There isn't an option to set the interval for Traces queries so I just wanted to clarify whether you are running a Traces query as shown in the issue description or aMetric query?

On a second look I think I was wrong a bit in my error description, please tell me if it's relevant since I still get the same error in OpenSearch logs.

I set the interval to a constant number (since there isn't a way to set a minimum interval instead) and when I set a big range I get this error since there are too many buckets.

@kevinwcyu
Copy link
Contributor

Hi @yotamN, we've seen the max bucket error for Metric queries in the past and we usually recommend adjusting the search.max_buckets setting in OpenSearch, but adjusting the interval is another way of tweaking the query to avoid hitting the error as well.

Since you mentioned you were setting the interval I just wanted to clarify if you were running a Metric query or a Traces query (like the one shown in the original issue description) because we haven't been able to reproduce the max bucket error for Traces queries yet. If it was a Traces query it would be good to get an example query to help us reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource/OpenSearch type/bug Something isn't working
Projects
Status: Waiting
Development

No branches or pull requests

6 participants