[Fleet] Improve storage of agent status field #141107

kpollich · 2022-09-20T14:53:54Z

Currently, agent status is a "derived" field that's calculated in Fleet based on a series of logical conditions around last check in time and various other fields.

There are a few problems with this approach, e.g.

Maintainability of these complex conditions and calculations around agent status
Appending filters for agent status to queries is very challenging, and can often incur substantial performance costs
Complicated query strings and filters based on status can incur nasty bugs with Elasticsearch/Lucene syntax

Let's use this issue to discuss alternative approaches and document other issues with agent status that should be solved.

References

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-09-20T14:53:56Z

Pinging @elastic/fleet (Team:Fleet)

kpollich · 2022-09-20T14:54:08Z

cc @nchaulet @juliaElastic @joshdover

kpollich · 2022-09-21T17:05:55Z

Kicking off the discussion here with a proposal for moving away from the "deriving" logic that currently exists in Fleet and towards a single status field maintained by Fleet Server as the source of truth.

The idea here is that Fleet would no longer be responsible for calculating the status displayed in the UI and used for various filtering operations based on various criteria like last check-in time, active value, or the presence of timestamps like unenroll_started_at.

Pros for this approach:

Reduce some maintainability burden in the Fleet codebase where Agent status is difficult to grok and hard to iterate on (see changes required to alter priority of "offline" status in [Fleet] Change agent status order offline before updating #140621)
Improve performance and possibilities around complex filtering related to agent status - filtering with a single field is much easier than filtering based on complex logic related to all of the above timestamps values, boolean checks, etc

Cons

Would require substantial refactoring and test in the Fleet Server codebase
Fleet might still have to do some calculation and persistence of this status field during various operations (e.g. unenrollment?), which would result in some duplication of concerns between Fleet and Fleet Server

Curious to hear other thoughts on this.

juliaElastic · 2022-09-23T08:34:28Z

I would add to the pros:

Supportability would increase to see clearly what is the agent's status at a point in time. Currently there is no easy way to determine the agent status other than calling the status API or checking the different fields involved.

I think at first we should document what are the possible state transitions, who updates the state (Kibana/FS) and which fields are updated. This would give a clearer picture on the refactor needed.

joshdover · 2022-10-17T12:04:35Z

We had discussed in some places using runtime fields to solve this. I don't think this is a practical option right now since this feature can be disabled via the expensive queries setting: elastic/elasticsearch#90898

kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 20, 2022

kpollich self-assigned this Sep 21, 2022

joshdover mentioned this issue Sep 29, 2022

[Fleet] Replace unenrollment timeout with UI-only inactivity timeout #143455

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Improve storage of agent status field #141107

[Fleet] Improve storage of agent status field #141107

kpollich commented Sep 20, 2022

elasticmachine commented Sep 20, 2022

kpollich commented Sep 20, 2022

kpollich commented Sep 21, 2022

juliaElastic commented Sep 23, 2022

joshdover commented Oct 17, 2022

[Fleet] Improve storage of agent status field #141107

[Fleet] Improve storage of agent status field #141107

Comments

kpollich commented Sep 20, 2022

References

elasticmachine commented Sep 20, 2022

kpollich commented Sep 20, 2022

kpollich commented Sep 21, 2022

juliaElastic commented Sep 23, 2022

joshdover commented Oct 17, 2022