[Task Manager] Optimize status field output for health api #102400

chrisronline · 2021-06-16T18:41:03Z

Relates to #101505

There are some points of confusion related to the task manager health API response that we should discuss and potentially fix.

The runtime health status is always OK, even though we set the overall status to Error based on a runtime metric
The workload health status is always OK, even though we set the overall status to Error based on a workload metric
In [Task Manager] Log at different levels based on the state #101751, we are logging a warning if the p99 runtime drift - maybe we should set the status to warning for the runtime bucket when this happens?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-16T18:41:05Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

gmmorris · 2021-07-15T11:37:45Z

The idea was that each section is updated independently and so when you get the overall health it would look at the constituent parts and reflect the overall health so that:

If any part is in Error, then the whole thing is in Error
If any part say's it's OK but it's not fresh enough (the last update was OK but for some reason hasn't updated in 5 minutes) then the overall API returns an Error satte
It would be clear which part was in error and which is ok

We can obviously throw that idea out 🤷 but that's where the idea comes from.

chrisronline added discuss Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 16, 2021

gmmorris added Project:ObservabilityOfAlerting Alerting team project for observability of alerting. and removed Project:ObservabilityOfAlerting Alerting team project for observability of alerting. labels Jun 30, 2021

mikecote added the loe:needs-research This issue requires some research before it can be worked on or estimated label Jul 21, 2021

gmmorris added insight Issues related to user insight into platform operations and resilience estimate:needs-research Estimated as too large and requires research to break down into workable issues labels Aug 16, 2021

gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021

XavierM added this to AppEx: ResponseOps - Execution & Connectors Jan 6, 2022

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

mikecote moved this to Todo in AppEx: ResponseOps - Execution & Connectors Sep 1, 2022

mikecote added the response-ops-ec-backlog ResponseOps E&C backlog label Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Manager] Optimize status field output for health api #102400

[Task Manager] Optimize status field output for health api #102400

chrisronline commented Jun 16, 2021 •

edited

Loading

elasticmachine commented Jun 16, 2021

gmmorris commented Jul 15, 2021

[Task Manager] Optimize status field output for health api #102400

[Task Manager] Optimize status field output for health api #102400

Comments

chrisronline commented Jun 16, 2021 • edited Loading

elasticmachine commented Jun 16, 2021

gmmorris commented Jul 15, 2021

chrisronline commented Jun 16, 2021 •

edited

Loading