-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task Manager] Potential performance problem log happens on startup #106854
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
We'll keep this issue scoped to changing the message, if we can find a better one. |
@gchaps we're hoping you can provide some alternative wording that would make this less "scary". Even with "potential" in there, folks have been concerned about this. The basic idea is that if you see this message just once in a while, things are likely fine. Kibana was busy for a bit, then recovered. If you see it a lot, then you probably need to investigate. We have some separate work, probably longer-term, to try to log this message less frequently, but right now it can get logged a few times in succession, and then stop. These are the situations we want to make this "less scary". And although the message is actionable, and useful, in terms of the config setting that can be changed, I would actually like to change that to point to our current doc that describes all this stuff, here: https://www.elastic.co/guide/en/kibana/current/task-manager-health-monitoring.html . Do we do that in other places? Presumably we'd use a version-specific URL here (instead of This link might be better, which is a FAQ-like page - https://www.elastic.co/guide/en/kibana/current/task-manager-troubleshooting.html - and presumably we'd update that doc to put info regarding this particular message at the top of the page - and that copy would presumably end up pointing to the |
I'm not sure there's actually a way to turn this message OFF - and wondering if we should provide that. Presumably a new cloud/docker-allowable config (or piggy-backed somehow on an existing one) to avoid logging it altogether. |
FWIW, if the config is disabled, we default to the existing behavior where it would log at |
except for the kibana/x-pack/plugins/task_manager/server/lib/log_health_metrics.ts Lines 79 to 87 in 6e3af2b
I'm thinking we probably want to change that to a |
Ah yea, true. I think we added that to avoid spamming the user if the config was off, but still give them some awareness that something might be off |
Per #109095, we're still spamming the user :-), hence my thinking we could change this to a |
Does something like this work? Task Manager detected a degradation in performance This is usually temporary, and Kibana can recover automatically. If the problem persists, check the docs for information about debug logging and troubleshooting. |
That does sound better, but is quite wordy for a log message :-). I think it captures what I'm hoping to express though, so is better than what we have now. Any thoughts on adding a doc link? I believe we may be able to add a doc link to our "health status" that gets percolated all the way up to Kibana's overall status, so another option is to refer to the overall Kibana status here (also a link), which presumably would provide more information, including a doc link into the task manager docs. |
Here are a couple of shorter versions Task Manager detected a degradation in performance Task Manager detected a potential performance problem Can we put the link in the description? |
Seeing the formatting, I'm wondering if you think these messages will be in a UI. Currently, this message only occurs in the Kibana logs. There's no fixed length, but it's nice to be shorter of course. And no notion of "bolding" anything. And link would need to be displayed in full. For example, the log message would be (this is my preferred version so far):
Super-long, but I think captures everything I wanted it to. Will need to figure out how to generate a per-version URL to the doc. I assume other folks are doing this as well - I think I've seen it in the client code, but this is the server :-) |
The text and the way you have formatted looks good to me. I'd add a period after the link. |
We don't yet have a docLinks service server-side, so not sure about including a link now. We could point to
Oooh, I hate this! :-). Some renderers aren't smart enough to strip these off, so would use the URL WITH THE PERIOD AT THE END. Github gets it right, when rendered in markdown: https://www.elastic.co/guide/en/kibana/7.14/task-manager-health-monitoring.html. But other UI things, like "select word" kind of UI actions will pick it up, and assume the URL includes the period, which would send them to https://www.elastic.co/guide/en/kibana/7.14/task-manager-health-monitoring.html. I hate it because I want to include puctuation, but to be safe, have to put a space in front of it, then it looks AWFUL. But heh, it's just a log message. I should probably include a space-period, so if they "select word" from a log viewer, it would select the URL since it will be space separated, since if it's at the end of a string, it could have a |
…ce is degraded (#109741) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)
…ce is degraded (elastic#109741) resolves elastic#109095 resolves elastic#106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)
…ce is degraded (elastic#109741) resolves elastic#109095 resolves elastic#106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)
…ce is degraded (elastic#109741) resolves elastic#109095 resolves elastic#106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error) # Conflicts: # x-pack/plugins/task_manager/server/monitoring/capacity_estimation.ts # x-pack/plugins/task_manager/server/monitoring/task_run_statistics.test.ts # x-pack/plugins/task_manager/server/routes/health.test.ts
…ce is degraded (#109741) (#110870) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)
…ce is degraded (#109741) (#110869) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)
…rformance is degraded (#109741) (#110875) * [task manager] provide better diagnostics when task manager performance is degraded (#109741) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error) # Conflicts: # x-pack/plugins/task_manager/server/monitoring/capacity_estimation.ts # x-pack/plugins/task_manager/server/monitoring/task_run_statistics.test.ts # x-pack/plugins/task_manager/server/routes/health.test.ts * fix backport to remove post-7.14 stuff
Reported originally by @LeeDr
Some users are seeing this on cloud upgrade and it might lead users to not understand if there is a problem or not. We might need to consider changing the verbiage on the message:
to indicate that not all users need to worry about this, as this might happen if Kibana is stopped for some period of time.
Feel free to add more information to this.
The text was updated successfully, but these errors were encountered: