Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Improve error logging (reduce to Info Level where possible) wrt changing logging level to debug #154

Closed
Tracked by #26930
ruflin opened this issue May 3, 2021 · 12 comments · Fixed by #187
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.2.0

Comments

@ruflin
Copy link
Contributor

ruflin commented May 3, 2021

  • Version: 7.13.0-SNAPSHOT
  • Environment: Docker

The Elastic Agent was running for a few minutes and I changed the logging level in the Fleet UI from Info to Debug. This all seems to have worked but the first we lines that were logged, looked as following:

2021-05-03T19:26:23.569Z	INFO	process/app.go:176	Signaling application to stop because of shutdown: metricbeat--7.13.0-SNAPSHOT
2021-05-03T19:26:24.066Z	INFO	log/reporter.go:40	2021-05-03T19:26:24Z - message: Application: filebeat--7.13.0-SNAPSHOT[4f12dd1d-f096-40b1-8bf4-8a0e66722775]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
2021-05-03T19:26:24.066Z	INFO	log/reporter.go:40	2021-05-03T19:26:24Z - message: Application: filebeat--7.13.0-SNAPSHOT--36643631373035623733363936343635[4f12dd1d-f096-40b1-8bf4-8a0e66722775]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
2021-05-03T19:26:24.066Z	INFO	log/reporter.go:40	2021-05-03T19:26:24Z - message: Application: metricbeat--7.13.0-SNAPSHOT--36643631373035623733363936343635[4f12dd1d-f096-40b1-8bf4-8a0e66722775]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
2021-05-03T19:26:24.066Z	INFO	log/reporter.go:40	2021-05-03T19:26:24Z - message: Application: metricbeat--7.13.0-SNAPSHOT[4f12dd1d-f096-40b1-8bf4-8a0e66722775]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
2021-05-03T19:26:24.999Z	ERROR	fleet/fleet_gateway.go:167	context canceled
2021-05-03T19:26:26.378Z	ERROR	fleet/fleet_gateway.go:167	context canceled
2021-05-03T19:26:27.852Z	INFO	fleet/fleet_gateway.go:298	Fleet gateway is stopping
2021-05-03T19:26:27.852Z	INFO	status/reporter.go:236	Elastic Agent status changed to: 'online'

I stumbled over the two ERROR log entries related to context which also contain very little "context" what it is about.

@ruflin ruflin added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels May 3, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/agent (Team:Agent)

@ruflin
Copy link
Contributor Author

ruflin commented May 3, 2021

Maybe helpful, I found one more error in the logs in the Fleet UI:

21:26:23.566 [elastic_agent][error] failed to dispatch actions, error: acknowledge 0 actions '[]' for elastic-agent '4f12dd1d-f096-40b1-8bf4-8a0e66722775' failed: fail to ack to fleet: Post "http://fleet-server:8220/api/fleet/agents/4f12dd1d-f096-40b1-8bf4-8a0e66722775/acks?": context canceled

@dikshachauhan-qasource
Copy link

Hi @EricDavisX

While performing testing, we have observed above mentioned errors occurring on 7.13 Bc3 cloud qa build on changing agent logging level.

Screenshot1:

image

Further, Issue 25385 is also occurring on kibana UI. Logs start to generate as per changed logging level however not updated on agent details or logs page.

Screenshot 2:
image

Please let us know if anything else is required.

Thanks
QAS

@michalpristas
Copy link
Contributor

michalpristas commented May 5, 2021

i think this is expected, we change log level by reexecing and stopping all components. so context cancelled may be a byproduct of this workflow. log level is not a reloadable value

@ruflin
Copy link
Contributor Author

ruflin commented May 5, 2021

If it is expected, should it be logged as error? And can we improve the log message?

@michalpristas
Copy link
Contributor

michalpristas commented May 5, 2021

what we discussed offline: this is expected from a point of view we log that context is cancelled and something is happening.
we dont have a check/signal to determine if this cancellation itself is expected or happened out of the blue.
this should be addressed and logged only if we're facing unintended behavior

@EricDavisX
Copy link
Contributor

I'd consider the log line itself as a minor improvement - is there any discussion on why the log level in the UI wasn't updated (or wasn't updated quickly enough to be dismissed) ? I don't know how long we should wait before considering it a bug - thoughts from the technical perspective?

@ruflin
Copy link
Contributor Author

ruflin commented May 10, 2021

@EricDavisX We didn't have any conversations around this but it is definitively worth discussing. @michalpristas Is the log level change also a new revision of the policy so it could be seen in the policy rollout overview? I assume not as it is specific per Agent. How can we let users know that it is rolled out? @EricDavisX I think we need to open an issue for this in Kibana to get the UX in place.

@EricDavisX EricDavisX changed the title [Elastic Agent] Error logged after changing logging level to debug [Elastic Agent] Improve error logging (reduce to Info Level where possible) wrt changing logging level to debug May 11, 2021
@EricDavisX
Copy link
Contributor

Sounds good @ruflin. I logged this for the separate Kibana work: elastic/kibana#99788
and I updated the short desc here to be specific of improving the log level to not be an 'Error' if we can predict it is a known situation and is not actually severe.

This issue is more severe and will remain open for the fact that sometimes the log level just isn't updated:
elastic/beats#25385

@ph ph added enhancement New feature or request and removed bug Something isn't working labels May 13, 2021
@ph ph removed the v7.14.0 label May 13, 2021
@jlind23 jlind23 added 8.2-candidate good first issue Good for newcomers Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed 7.15-candidate labels Dec 15, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@jlind23
Copy link
Contributor

jlind23 commented Jan 13, 2022

As seen with @ph reducing the scope of this issue to rework only the status loop mentioned here:
https://user-images.githubusercontent.com/72375798/116995145-518c9780-acf7-11eb-9ad5-30b0db22a2e7.png

@jlind23 jlind23 transferred this issue from elastic/beats Mar 7, 2022
@jlind23
Copy link
Contributor

jlind23 commented Mar 9, 2022

@ph the status log is coming from this file:
https://github.com/elastic/elastic-agent/blob/main/internal/pkg/core/status/reporter.go#:~:text=%7D-,func%20(r%20*controller)%20updateStatus()%20%7B,%7D,-if%20r.status

I do not see any issues there. Is it possible that Kibana UI "fails" to parse our logs and retrieve all logs without taking care of the specified level?

@ph ph self-assigned this Mar 9, 2022
ph added a commit to ph/elastic-agent that referenced this issue Mar 9, 2022
This code improve the error reporting when Elastic Agent is stopped
through the pass context. This move the state reporting closer to the
execution of the request. Error related to the request would still be
logged as error and will change the internal state to non healthy.

Fixes: elastic#154
@ph ph closed this as completed in #187 Mar 16, 2022
ph added a commit that referenced this issue Mar 16, 2022
* Move the `context cancelled` error message into a debug statement

This code improve the error reporting when Elastic Agent is stopped
through the pass context. This move the state reporting closer to the
execution of the request. Error related to the request would still be
logged as error and will change the internal state to non healthy.

Fixes: #154
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants