Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Notification when Fleet Server is not healthy #95572

Closed
mostlyjason opened this issue Mar 26, 2021 · 7 comments
Closed

[Fleet] Notification when Fleet Server is not healthy #95572

mostlyjason opened this issue Mar 26, 2021 · 7 comments
Assignees
Labels
design Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@mostlyjason
Copy link
Contributor

mostlyjason commented Mar 26, 2021

Describe the feature:
If the Fleet server is not healthy we should show a notification within the Fleet app to help users understand why their updates are not being applied to the agents. When there is one healthy Fleet server and others are not, we could show a message that the system may be at reduced capacity. When no Fleet Servers are healthy, we should show a warning that updates may not be applied and provide a link to troubleshooting docs. The warning should not shown before the user has added Fleet Server, or if they have removed all agents with a fleet server (they are not using Fleet Server any more). That's because we have separate prompts that invite the user to add a Fleet Server.

User should be able to see this notification on the Agents page in Fleet. As a stretch goal, we can consider other locations such as after saving a change to an agent policy.

Describe a specific use case for the feature:

  • As a user of an integration or elastic agent, I would like to know when my agent policy updates are not being applied to the agents because the Fleet server is not healthy.
@mostlyjason mostlyjason added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@mostlyjason
Copy link
Contributor Author

@hbharding we could use a design for this issue. I think you have something in the works?

@hbharding
Copy link
Contributor

I had considered the case for when all Fleet Servers are down and how we might show that on the agents page in Fleet, but this design does not address the other scenarios you've described. I can think on that more next week.

image

@mostlyjason
Copy link
Contributor Author

@hbharding I was talking to PH and he suggested the idea of having a persistent health status indicator for the Fleet Servers. It could potentially go in the header as a global indicator. I think users would consider it separate from the filters in the table below, since the fleet servers are used by all agents. It could be green/yellow/red depending on state. Curious what you think about this?

@mostlyjason
Copy link
Contributor Author

@hbharding I was discussing this issue with Mukesh and he feels pretty strongly that we should start with the stack features for monitoring. That means using dashboards and alerts to notify users of Fleet Server status. I suggest keeping this on the backlog until we get more input from users.

@hbharding
Copy link
Contributor

Ok. I was talking with Katrin about this issue the other day and we had a similar idea around alerting. It would be useful for admins to receive alert notifications (email, etc.) when Fleet Server(s) are down after a certain amount of time. They shouldn't have to rely on visiting the Kibana / Fleet app to discover servers are down.

@jsoriano
Copy link
Member

I think this notification already exists, closing this. @juliaElastic please reopen if you think something else is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

5 participants