Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Improve storage of agent status field #141107

Open
kpollich opened this issue Sep 20, 2022 · 5 comments
Open

[Fleet] Improve storage of agent status field #141107

kpollich opened this issue Sep 20, 2022 · 5 comments
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kpollich
Copy link
Member

Currently, agent status is a "derived" field that's calculated in Fleet based on a series of logical conditions around last check in time and various other fields.

There are a few problems with this approach, e.g.

  • Maintainability of these complex conditions and calculations around agent status
  • Appending filters for agent status to queries is very challenging, and can often incur substantial performance costs
  • Complicated query strings and filters based on status can incur nasty bugs with Elasticsearch/Lucene syntax

Let's use this issue to discuss alternative approaches and document other issues with agent status that should be solved.

References

@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 20, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member Author

@kpollich kpollich self-assigned this Sep 21, 2022
@kpollich
Copy link
Member Author

Kicking off the discussion here with a proposal for moving away from the "deriving" logic that currently exists in Fleet and towards a single status field maintained by Fleet Server as the source of truth.

The idea here is that Fleet would no longer be responsible for calculating the status displayed in the UI and used for various filtering operations based on various criteria like last check-in time, active value, or the presence of timestamps like unenroll_started_at.

Pros for this approach:

  • Reduce some maintainability burden in the Fleet codebase where Agent status is difficult to grok and hard to iterate on (see changes required to alter priority of "offline" status in [Fleet] Change agent status order offline before updating #140621)
  • Improve performance and possibilities around complex filtering related to agent status - filtering with a single field is much easier than filtering based on complex logic related to all of the above timestamps values, boolean checks, etc

Cons

  • Would require substantial refactoring and test in the Fleet Server codebase
  • Fleet might still have to do some calculation and persistence of this status field during various operations (e.g. unenrollment?), which would result in some duplication of concerns between Fleet and Fleet Server

Curious to hear other thoughts on this.

@juliaElastic
Copy link
Contributor

I would add to the pros:

  • Supportability would increase to see clearly what is the agent's status at a point in time. Currently there is no easy way to determine the agent status other than calling the status API or checking the different fields involved.

I think at first we should document what are the possible state transitions, who updates the state (Kibana/FS) and which fields are updated. This would give a clearer picture on the refactor needed.

@joshdover
Copy link
Contributor

We had discussed in some places using runtime fields to solve this. I don't think this is a practical option right now since this feature can be disabled via the expensive queries setting: elastic/elasticsearch#90898

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

4 participants