Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Task Manager] Return HealthStatus.Warning instead of error when task manager stats are not yet available. #169447

Merged
merged 2 commits into from
Oct 26, 2023

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Oct 19, 2023

Resolves #168237

Summary

This PR fixes a bug in the task manager status calculations that would return 0 for the "cold timestamp" when it didn't exist inside the stats and compare the difference of now - 0 against the required freshness threshold. This was causing the task manager status to initially be set to Error on startup (when there were no task manager stats available at all). Updated to return a warning with an appropriate reason message.

To verify:

Run ES & Kibana on main. See startup logs warning about task manager health:

[2023-10-25T15:27:00.740-04:00][WARN ][status.plugins.taskManager] taskManager plugin is now degraded: Task Manager is unhealthy - Reason: setting HealthStatus.Error because of expired cold timestamps
[2023-10-25T15:27:00.860-04:00][WARN ][status] Kibana is now degraded: 0 service(s) and 1 plugin(s) are degraded: taskManager

Switch to this branch and run ES & Kibana. There should be no warnings about task manager errors in the logs.

@ymao1
Copy link
Contributor Author

ymao1 commented Oct 23, 2023

@elasticmachine merge upstream

2 similar comments
@ymao1
Copy link
Contributor Author

ymao1 commented Oct 24, 2023

@elasticmachine merge upstream

@ymao1
Copy link
Contributor Author

ymao1 commented Oct 25, 2023

@elasticmachine merge upstream

@ymao1 ymao1 force-pushed the task-manager/expired-timestamps branch from 53ce43b to ff28454 Compare October 25, 2023 19:32
@ymao1 ymao1 changed the title Don't set status to error when no monitoring stats are available [Response Ops][Task Manager] Return HealthStatus.Warning instead of error when task manager stats are not yet available. Oct 25, 2023
@ymao1 ymao1 self-assigned this Oct 25, 2023
@ymao1 ymao1 added release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.12.0 labels Oct 25, 2023
@ymao1 ymao1 marked this pull request as ready for review October 25, 2023 20:55
@ymao1 ymao1 requested a review from a team as a code owner October 25, 2023 20:55
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1
Copy link
Contributor Author

ymao1 commented Oct 26, 2023

@elasticmachine merge upstream

Copy link
Contributor

@doakalexi doakalexi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, LGTM!

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@@ -19,6 +19,11 @@ export function calculateHealthStatus(
): { status: HealthStatus; reason?: string } {
const now = Date.now();

// if stats are empty, return a warning
if (isEmpty(summarizedStats.stats)) {
Copy link
Contributor

@gsoldevila gsoldevila Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see no health stats available message at startup (it's now a warning).

I understand that the health URL must report something when called, but for the status observables there's no need IMO (on our status_service we already consider plugins unavailable until they report otherwise).

@ymao1 I created a draft PR that addresses this by filtering out "empty" status emissions. It'd be nice if you could take a look to make sure I'm not causing undesired side effects:

#171012

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gsoldevila! That makes a lot of sense to me. I left one comment but the approach LGTM

gsoldevila added a commit that referenced this pull request Nov 14, 2023
…rvables (#171012)

## Summary

Improvement over #169447, see
[related comment](#171012).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task manager plugin is reporting an error message at startup
6 participants