added more detailed logs around ES communication failure #992
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the problem this PR solves?
This PR adds a log line signifying the recovery of fleet server while polling ES. Also,
How does this PR solve the problem?
For the logging change:
Keeps track of the errored state of the monitor, and on the first successful communication with poll it logs at info level.
For surfacing the error through the http router:
Returns a
503
in the case where the request to ES results in an error containingconnection refused
. This method of error classification was copied from here, as I was unable to hunt down an error instance to compare the error against.How to test this PR locally
make local
then run the binary against a local ES instance. Kill the ES instance.To test the logging change:
Wait til the logs show the fleet server is having difficulties communicating with ES, then restart the ES instance. A log line similar to the one below should appear shortly:
To test the error surfacing:
While the ES instance is down, attempt to run an install an agent with the fleet server, resulting in an error similar to:
Checklist