[Task Manager] Optimize status field output for health api #102400
Labels
discuss
estimate:needs-research
Estimated as too large and requires research to break down into workable issues
Feature:Task Manager
insight
Issues related to user insight into platform operations and resilience
response-ops-ec-backlog
ResponseOps E&C backlog
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
Relates to #101505
There are some points of confusion related to the task manager health API response that we should discuss and potentially fix.
The runtime health status is always
OK
, even though we set the overall status toError
based on a runtime metricThe workload health status is always
OK
, even though we set the overall status toError
based on a workload metricIn [Task Manager] Log at different levels based on the state #101751, we are logging a warning if the p99 runtime drift - maybe we should set the status to warning for the runtime bucket when this happens?
The text was updated successfully, but these errors were encountered: