Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics UI] Too Many Buckets crashes Kibana when previewing inventory alerts #69323

Closed
1 task
Zacqary opened this issue Jun 16, 2020 · 7 comments · Fixed by #70503
Closed
1 task

[Metrics UI] Too Many Buckets crashes Kibana when previewing inventory alerts #69323

Zacqary opened this issue Jun 16, 2020 · 7 comments · Fixed by #70503
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0

Comments

@Zacqary
Copy link
Contributor

Zacqary commented Jun 16, 2020

It's difficult to implement a Too Many Buckets error handler for previewing inventory alerts, because the Snapshot API seems to crash Kibana when it hits one of these.

Acceptance Criteria

  • Gracefully handle too_many_buckets_exception from the Snapshot API without crashing Kibana

To Reproduce

Using data from our shared cluster, get a Snapshot of a node type that there are a LOT of (Kubernetes Pods seems to do it for me), and set the lookbackSize to a month.

Output of the error I'm hitting:

Unhandled Promise rejection detected:

{ Error: [search_phase_execution_exception] 
    at respond (/Users/zacqary/Code/kibana/node_modules/elasticsearch/src/lib/transport.js:349:15)
    at checkRespForFailure (/Users/zacqary/Code/kibana/node_modules/elasticsearch/src/lib/transport.js:306:7)
    at HttpConnector.<anonymous> (/Users/zacqary/Code/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:173:7)
    at IncomingMessage.wrapper (/Users/zacqary/Code/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4929:19)
    at IncomingMessage.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)
  status: 503,
  displayName: 'ServiceUnavailable',
  message: '[search_phase_execution_exception] ',
  path: '/metricbeat-*/_search',
  query:
   { allow_no_indices: true,
     ignore_unavailable: true,
     ignore_throttled: true },
  body:
   { error:
      { root_cause: [],
        type: 'search_phase_execution_exception',
        reason: '',
        phase: 'fetch',
        grouped: true,
        failed_shards: [],
        caused_by: [Object] },
     status: 503 },
  statusCode: 503,
  response:
   '{"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"","phase":"fetch","grouped":true,"failed_shards":[],"caused_by":{"type":"too_many_buckets_exception","reason":"Trying to create too many buckets. Must be less than or equal to: [65535] but was [66286]. This limit can be set by changing the [search.max_buckets] cluster level setting.","max_buckets":65535}},"status":503}',
  toString: [Function],
  toJSON: [Function] }

Terminating process...
 server crashed  with status code 1
@Zacqary Zacqary added bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0 labels Jun 16, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@simianhacker
Copy link
Member

What about setting timerange.ignoreLookback: true and timerange.forceInterval: true then you can control everything with the timerange.to, timerange.from and timerange.interval? The number of buckets is going to be to - from / (intervalInSeconds * 1000). If you need a months worth of 1 minute data you will need to make several requests to prevent hitting the circuit breakers.

@Zacqary
Copy link
Contributor Author

Zacqary commented Jun 16, 2020

That could definitely work, but I'm concerned that this API crashes Kibana entirely when you hit this error. Like it just freezes, I have to Control-C it and restart and everything.

@simianhacker
Copy link
Member

Is the issue that the promise is not being handled in the Alert controller? I guess it's not clear to me what changes need to be made.

@Zacqary
Copy link
Contributor Author

Zacqary commented Jun 17, 2020

Not sure where the promise rejection occurs. Here's the traceback I got:

{ Error: [search_phase_execution_exception] 
    at respond (/Users/zacqary/Code/kibana/node_modules/elasticsearch/src/lib/transport.js:349:15)
    at checkRespForFailure (/Users/zacqary/Code/kibana/node_modules/elasticsearch/src/lib/transport.js:306:7)
    at HttpConnector.<anonymous> (/Users/zacqary/Code/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:173:7)
    at IncomingMessage.wrapper (/Users/zacqary/Code/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4929:19)
    at IncomingMessage.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)

Looks like one of those broken call stack kind of situations, I doubt that tells us where in the code the actual problem occurred.

It might not be happening in the Snapshot API. Let's keep this issue open to track it down.

@sgrodzicki sgrodzicki added this to the Metrics UI 7.9 milestone Jun 29, 2020
@sgrodzicki sgrodzicki assigned Zacqary and unassigned simianhacker Jun 29, 2020
@Zacqary
Copy link
Contributor Author

Zacqary commented Jun 30, 2020

Yep, turns out that this is an uncaught error within the Inventory alert and not the Snapshot API. I'll add a handler.

@Zacqary
Copy link
Contributor Author

Zacqary commented Jun 30, 2020

Spoke too soon. The handler I wrote worked once and has now ceased to work. This might be a deeper problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants