Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring emits many scary log messages on a freshly-installed cluster #40898

Closed
DaveCTurner opened this issue Apr 5, 2019 · 12 comments · Fixed by #54265
Closed

Monitoring emits many scary log messages on a freshly-installed cluster #40898

DaveCTurner opened this issue Apr 5, 2019 · 12 comments · Fixed by #54265

Comments

@DaveCTurner
Copy link
Contributor

When a new cluster starts up with monitoring enabled it emits multiple messages of the following form:

waiting for elected master node ... to setup local exporter [default_local] (does it have x-pack installed?)

This message is shown on nodes other than the elected master each time (except the first) they receive a cluster state update from the elected master in which monitoring is not completely set up (i.e. a template or an ingest pipeline is missing).

One explanation for this is that the elected master is not configured to set up monitoring, perhaps because it does not have X-pack installed (hence the does it have x-pack installed?). However when a cluster is first forming the master has quite a few things to do, and may not get around to setting up monitoring for some time.

I think we should not emit this message until the node has been waiting an unreasonably long time for the master to set these things up.

I'm moving this comment from #28974 into its own issue because that issue appears to be about docs rather than about suppressing this warning.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@kiawin
Copy link
Contributor

kiawin commented Apr 24, 2019

@DaveCTurner Any thought on how much time is considered "an unreasonably long time"?

I can work on this ticket, if that's ok with you :)

@DaveCTurner
Copy link
Contributor Author

Great!

Let's default to 30 seconds, starting from the cluster forming (i.e. the first call to clusterChanged() in which event.state().blocks().hasGlobalBlock(GatewayService.STATE_NOT_RECOVERED_BLOCK) is false). This time should be configurable via a Setting.timeSetting rather than hard-coded.

@mariaral
Copy link
Contributor

Hey @kiawin are you still interested in this issue? If not, I could work on this.

@kiawin
Copy link
Contributor

kiawin commented Feb 27, 2020

@mariaral thanks for your interest. please proceed 😄

@mariaral
Copy link
Contributor

mariaral commented Mar 8, 2020

Hello @DaveCTurner ,
I had a look at this issue and tried to understand the code. I like your idea of waiting a configurable amount of time before warning the users. From what I understand, this cannot be done inside the resolveBulk method and we will have to create a Future that will check after x seconds if the exporter is set up by the master and prints the warning otherwise. Does this sound reasonable?

@DaveCTurner
Copy link
Contributor Author

@MariaL I think it'd be simpler to record the time at which the cluster forms (i.e. the first time that clusterChanged() is called for a cluster state without GatewayService.STATE_NOT_RECOVERED_BLOCK) and then only emit the warning if enough time has elapsed after that.

The current (relative) time is available in this class from client.threadPool().relativeTimeInMillis().

@mariaral
Copy link
Contributor

@DaveCTurner thank you for taking a look. I initially tried your suggestion but since the resolveBulk() is called only at cluster updates, if no more cluster updates happen after 30 seconds (i.e. the nodes have joined the cluster before that time elapses), the warning will not be printed even though the master may not have x-pack installed. That’s why I suggested checking this asynchronously. Do you have any other suggestions for that?

@DaveCTurner
Copy link
Contributor Author

That's not unreasonable, although in practice there'll always be more cluster state updates. Maybe resolveBulk() is the wrong place to log this. How about logging this warning in openBulk() if state is not RUNNING and more than 30 seconds have elapsed since the cluster formed?

@mariaral
Copy link
Contributor

Great suggestion @DaveCTurner! It seems to solve the issue. I will prepare a PR asap.

mariaral added a commit to mariaral/elasticsearch that referenced this issue Mar 26, 2020
Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.

The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.

Closes elastic#40898
@mariaral
Copy link
Contributor

@DaveCTurner I just created a PR. Please take a look.

danhermann pushed a commit that referenced this issue Apr 21, 2020
Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.

The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.

Closes #40898

Co-authored-by: Elastic Machine <[email protected]>
@DaveCTurner
Copy link
Contributor Author

Thanks @mariaral and @danhermann for closing this 😁

danhermann pushed a commit to danhermann/elasticsearch that referenced this issue May 4, 2020
Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.

The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.

Closes elastic#40898

Co-authored-by: Elastic Machine <[email protected]>
danhermann added a commit that referenced this issue May 4, 2020
* Delay warning about missing x-pack (#54265)

Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.

The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.

Closes #40898
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants