[Task Manager] Potential performance problem log happens on startup #106854

chrisronline · 2021-07-27T13:54:47Z

Reported originally by @LeeDr

Some users are seeing this on cloud upgrade and it might lead users to not understand if there is a problem or not. We might need to consider changing the verbiage on the message:

Detected potential performance issue with Task Manager. Set 'xpack.task_manager.monitored_stats_health_verbose_log.enabled: true' in your Kibana.yml to enable debug logging`)

to indicate that not all users need to worry about this, as this might happen if Kibana is stopped for some period of time.

Feel free to add more information to this.

elasticmachine · 2021-07-27T13:54:48Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

mikecote · 2021-07-28T15:41:03Z

We'll keep this issue scoped to changing the message, if we can find a better one.

pmuellr · 2021-08-20T20:08:35Z

@gchaps we're hoping you can provide some alternative wording that would make this less "scary". Even with "potential" in there, folks have been concerned about this.

The basic idea is that if you see this message just once in a while, things are likely fine. Kibana was busy for a bit, then recovered. If you see it a lot, then you probably need to investigate.

We have some separate work, probably longer-term, to try to log this message less frequently, but right now it can get logged a few times in succession, and then stop. These are the situations we want to make this "less scary".

And although the message is actionable, and useful, in terms of the config setting that can be changed, I would actually like to change that to point to our current doc that describes all this stuff, here: https://www.elastic.co/guide/en/kibana/current/task-manager-health-monitoring.html . Do we do that in other places? Presumably we'd use a version-specific URL here (instead of current).

This link might be better, which is a FAQ-like page - https://www.elastic.co/guide/en/kibana/current/task-manager-troubleshooting.html - and presumably we'd update that doc to put info regarding this particular message at the top of the page - and that copy would presumably end up pointing to the task-manager-health-monitoring.html page.

pmuellr · 2021-08-20T20:12:41Z

I'm not sure there's actually a way to turn this message OFF - and wondering if we should provide that. Presumably a new cloud/docker-allowable config (or piggy-backed somehow on an existing one) to avoid logging it altogether.

chrisronline · 2021-08-20T20:30:03Z

FWIW, if the config is disabled, we default to the existing behavior where it would log at debug level: https://github.com/elastic/kibana/blob/master/x-pack/plugins/task_manager/server/lib/log_health_metrics.ts#L47

pmuellr · 2021-08-23T13:50:07Z

if the config is disabled, we default to the existing behavior where it would log at debug level:

except for the logger.warn(), here :-)

kibana/x-pack/plugins/task_manager/server/lib/log_health_metrics.ts

Lines 79 to 87 in 6e3af2b

    
           } else { 
        
             // This is legacy support - we used to always show this 
        
             logger.debug(message); 
        
             if (logLevel !== LogLevel.Debug && lastLogLevel === LogLevel.Debug) { 
        
               logger.warn( 
        
                 `Detected potential performance issue with Task Manager. Set 'xpack.task_manager.monitored_stats_health_verbose_log.enabled: true' in your Kibana.yml to enable debug logging` 
        
               ); 
        
             } 
        
           }

I'm thinking we probably want to change that to a logger.debug() ...

chrisronline · 2021-08-23T13:55:52Z

Ah yea, true. I think we added that to avoid spamming the user if the config was off, but still give them some awareness that something might be off

pmuellr · 2021-08-23T13:59:30Z

Per #109095, we're still spamming the user :-), hence my thinking we could change this to a logger.debug(). I don't mind it being as spammy there ...

gchaps · 2021-08-23T14:51:20Z

Does something like this work?

Task Manager detected a degradation in performance

This is usually temporary, and Kibana can recover automatically. If the problem persists, check the docs for information about debug logging and troubleshooting.

pmuellr · 2021-08-23T15:27:21Z

That does sound better, but is quite wordy for a log message :-). I think it captures what I'm hoping to express though, so is better than what we have now.

Any thoughts on adding a doc link? I believe we may be able to add a doc link to our "health status" that gets percolated all the way up to Kibana's overall status, so another option is to refer to the overall Kibana status here (also a link), which presumably would provide more information, including a doc link into the task manager docs.

gchaps · 2021-08-23T15:46:28Z

Here are a couple of shorter versions

Task Manager detected a degradation in performance
This is usually temporary, and Kibana can recover automatically. If the problem persists, check the docs for troubleshooting information.

Task Manager detected a potential performance problem
Kibana was likely busy, or stopped, and then recovered. If the error persists, check the docs.

Can we put the link in the description?

pmuellr · 2021-08-24T15:09:30Z

Seeing the formatting, I'm wondering if you think these messages will be in a UI. Currently, this message only occurs in the Kibana logs. There's no fixed length, but it's nice to be shorter of course. And no notion of "bolding" anything. And link would need to be displayed in full.

For example, the log message would be (this is my preferred version so far):

Task Manager detected a degradation in performance. This is usually temporary, and Kibana can recover automatically. If the problem persists, check the docs for troubleshooting information at https://www.elastic.co/guide/en/kibana/7.14/task-manager-health-monitoring.html

Super-long, but I think captures everything I wanted it to.

Will need to figure out how to generate a per-version URL to the doc. I assume other folks are doing this as well - I think I've seen it in the client code, but this is the server :-)

gchaps · 2021-08-24T15:29:32Z

The text and the way you have formatted looks good to me. I'd add a period after the link.

pmuellr · 2021-08-24T15:57:19Z

We don't yet have a docLinks service server-side, so not sure about including a link now. We could point to current - for now, until we do, or we could just name the chapter in the doc. Using the current version sounds better, we can open an issue to fix that to use a docLink, once we have docLinks server-side.

I'd add a period after the link.

Oooh, I hate this! :-). Some renderers aren't smart enough to strip these off, so would use the URL WITH THE PERIOD AT THE END. Github gets it right, when rendered in markdown: https://www.elastic.co/guide/en/kibana/7.14/task-manager-health-monitoring.html. But other UI things, like "select word" kind of UI actions will pick it up, and assume the URL includes the period, which would send them to https://www.elastic.co/guide/en/kibana/7.14/task-manager-health-monitoring.html.

I hate it because I want to include puctuation, but to be safe, have to put a space in front of it, then it looks AWFUL.

But heh, it's just a log message. I should probably include a space-period, so if they "select word" from a log viewer, it would select the URL since it will be space separated, since if it's at the end of a string, it could have a " at the end of the URL (same problem as with ., but different char).

…ce is degraded (#109741) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)

…ce is degraded (elastic#109741) resolves elastic#109095 resolves elastic#106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)

…ce is degraded (elastic#109741) resolves elastic#109095 resolves elastic#106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error) # Conflicts: # x-pack/plugins/task_manager/server/monitoring/capacity_estimation.ts # x-pack/plugins/task_manager/server/monitoring/task_run_statistics.test.ts # x-pack/plugins/task_manager/server/routes/health.test.ts

…ce is degraded (#109741) (#110870) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)

…ce is degraded (#109741) (#110869) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error)

…rformance is degraded (#109741) (#110875) * [task manager] provide better diagnostics when task manager performance is degraded (#109741) resolves #109095 resolves #106854 Changes the way task manager and alerting perform their health / status checks: - no longer sets an `unavailable` status; now uses `degraded` instead - change task manager "hot stats freshness" calculation to allow for staler data before signalling a problem - Changed the "Detected potential performance issue" message to sound less scary, include a doc link to task manager health monitoring, and log a debug instead of warning level - add additional debug logging when task manager sets a status that's not `available`, indicating why it's setting that status (in the code, it's when task manager uses HealthStatus.Warning or Error) # Conflicts: # x-pack/plugins/task_manager/server/monitoring/capacity_estimation.ts # x-pack/plugins/task_manager/server/monitoring/task_run_statistics.test.ts # x-pack/plugins/task_manager/server/routes/health.test.ts * fix backport to remove post-7.14 stuff

chrisronline added loe:needs-research This issue requires some research before it can be worked on or estimated Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jul 27, 2021

pmuellr self-assigned this Jul 29, 2021

pmuellr mentioned this issue Aug 24, 2021

[task manager] provide better diagnostics when task manager performance is degraded #109741

Merged

7 tasks

pmuellr closed this as completed in #109741 Sep 1, 2021

gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task Manager] Potential performance problem log happens on startup #106854

[Task Manager] Potential performance problem log happens on startup #106854

chrisronline commented Jul 27, 2021 •

edited by pmuellr

Loading

elasticmachine commented Jul 27, 2021

mikecote commented Jul 28, 2021

pmuellr commented Aug 20, 2021

pmuellr commented Aug 20, 2021

chrisronline commented Aug 20, 2021 •

edited

Loading

pmuellr commented Aug 23, 2021

chrisronline commented Aug 23, 2021

pmuellr commented Aug 23, 2021

gchaps commented Aug 23, 2021

pmuellr commented Aug 23, 2021

gchaps commented Aug 23, 2021 •

edited

Loading

pmuellr commented Aug 24, 2021 •

edited

Loading

gchaps commented Aug 24, 2021

pmuellr commented Aug 24, 2021

[Task Manager] Potential performance problem log happens on startup #106854

[Task Manager] Potential performance problem log happens on startup #106854

Comments

chrisronline commented Jul 27, 2021 • edited by pmuellr Loading

elasticmachine commented Jul 27, 2021

mikecote commented Jul 28, 2021

pmuellr commented Aug 20, 2021

pmuellr commented Aug 20, 2021

chrisronline commented Aug 20, 2021 • edited Loading

pmuellr commented Aug 23, 2021

chrisronline commented Aug 23, 2021

pmuellr commented Aug 23, 2021

gchaps commented Aug 23, 2021

pmuellr commented Aug 23, 2021

gchaps commented Aug 23, 2021 • edited Loading

pmuellr commented Aug 24, 2021 • edited Loading

gchaps commented Aug 24, 2021

pmuellr commented Aug 24, 2021

chrisronline commented Jul 27, 2021 •

edited by pmuellr

Loading

chrisronline commented Aug 20, 2021 •

edited

Loading

gchaps commented Aug 23, 2021 •

edited

Loading

pmuellr commented Aug 24, 2021 •

edited

Loading