Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elastic-agent: HEALTHY status fluctuates for fleet-server #25341

Closed
mtojek opened this issue Apr 27, 2021 · 12 comments
Closed

elastic-agent: HEALTHY status fluctuates for fleet-server #25341

mtojek opened this issue Apr 27, 2021 · 12 comments
Labels
Stalled Team:Elastic-Agent Label for the Agent team

Comments

@mtojek
Copy link
Contributor

mtojek commented Apr 27, 2021

Spotted in elastic/integrations#950

We tried to use elastic-agent status as healthcheck for the fleet server, but apparently the stack initialization fails due to unstable HEALTHY status (suspected by @blakerouse ).

We would like to use the status command instead of /api/status, but this issue seems to be a blocker.

@mtojek mtojek added the Team:Elastic-Agent Label for the Agent team label Apr 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@blakerouse blakerouse self-assigned this Apr 27, 2021
@ph ph added the v7.13.0 label Apr 27, 2021
@blakerouse
Copy link
Contributor

@ph I think we should move this to 7.14. Being that it will take some investigating on the best way to fix this issue and it's not critical for the 7.13 release.

@ph ph added v7.14.0 and removed Team:Elastic-Agent Label for the Agent team labels May 4, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 4, 2021
@ph ph added Team:Elastic-Agent Label for the Agent team and removed v7.13.0 labels May 4, 2021
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 4, 2021
@ph
Copy link
Contributor

ph commented May 4, 2021

Added to 7.14 and to the meta issue.

@urso
Copy link

urso commented Jun 21, 2021

@ph @blakerouse Do we have an update for this issue?

@ph
Copy link
Contributor

ph commented Jun 22, 2021

Could this be similar to issue described in #25940 ?

@blakerouse
Copy link
Contributor

I don't have an update for this issue. We need to determine how we want to improve the status command.

An approach to providing stability to this command would be to add debouce to healthy reporting. Basically requiring that healthy be reported for an X amount of time before actually saying internally that Elastic Agent is healthy.

@ruflin
Copy link
Contributor

ruflin commented Jun 28, 2021

If I remember correctly, one issue in the past was that Elastic Agent was healthy but then goes to unhealthy state again when updating the policy. If this is the case, I'm not sure this is expected as the Elastic Agent itself is still healthy. Something similar was the case during the fleet-server self enrollment. What is our expected behaviour if the health status of the Elastic Agent if a subprocess is not healthy?

@blakerouse
Copy link
Contributor

If a subprocess is not healthy then Elastic Agent should not be healthy. When updating policy the Elastic Agent does transition away from healthy to configuring, which is correct state for it. The issue is more that the Elastic Agent is reporting healthy when it should not be healthy, not the opposite direction.

@EricDavisX
Copy link
Contributor

@mtojek do you have an easy way to reproduce this... maybe re-doing the change as you originally tried? or @blakerouse do you need that at this point?

@mtojek
Copy link
Contributor Author

mtojek commented Jul 15, 2021

I think there is nothing special to reproduce. It's just interpretation of the behavior that Blake described above.

@EricDavisX
Copy link
Contributor

I'm doing follow-ups on open issue for 7.14 - is this still under review / possible merge for 7.14? If we aren't attempting it, then please update the label to 7.15 or beyond and put it to backlog from 'iteration' as well. Thank you. @blakerouse

@blakerouse blakerouse removed their assignment Oct 19, 2021
@botelastic
Copy link

botelastic bot commented Oct 19, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 19, 2022
@botelastic botelastic bot closed this as completed Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stalled Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

7 participants