-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/api/fleet/agent_status provides incorrect counts #134798
Comments
Pinging @elastic/fleet (Team:Fleet) |
In this case, after my drones had been up I found that Fleet was reporting 4 drones in updating state and 10000 healthy but the actual number of drones on my system was 10000 Plus the default agent policy so it should have been 10001. |
Recent examples while trying to do 25,000: |
I looked into this, and what happens in the code is that there are queries made in a loop for different statuses: kibana/x-pack/plugins/fleet/server/services/agents/status.ts Lines 58 to 68 in de9a822
I think the reason of discrepancies is that the status changes quickly between For the other discrepancy, where the EDIT: while testing the fix, I found one occurrence of the count discrepancy after starting upgrade of agents before halting them.
I couldn't reproduce this problem since, what I noticed is that when I start upgrade and unenroll agents (with horde halt), agents are still showing up in updating state even after the offline timeout (5m) has passed. |
I see this one in 8.4.0 when trying to bring up 25,000. |
This looks good, |
Thanks for clarifying how the counts work. In the case above, I got stuck in that state and timed out waiting to converge so maybe a slightly different issue. It seems like a different version of the same issue ... can total be 25020 when only 25000 drones are involved? |
Oh I see, I missed that the total was greater than the actual agents enrolled. Yes, this seems like a bug, though I do not have any clue yet what could cause that. |
I do not recall this, maybe @nchaulet knows? |
I think the |
Kibana version:
8.3.0-6e69754c
Elasticsearch version:
Server OS version:
Browser version:
Browser OS version:
Original install method (e.g. download page, yum, from source, etc.):
Describe the bug:
I use /api/fleet/agent_status daily to determine what is going on in Fleet. The result of the /api/fleet/agent_status REST API call is often wrong because it provides incorrect counts. If you poll the /api/fleet/agent_status while doing fleet operations these inconsistencies show up often.
For example the following call for a policy with 10000 agents reports the total as 10004.
FleetAgentStatus(total=10004, inactive=0, online=9811, error=0, offline=150, updating=43, other=43, events=0, run_id=None, timestamp=None, kuery='policy_id : swarm.py:95
f2fba850-f0f7-11ec-9c99-f30f5bda23da', cluster_name=None)
Steps to reproduce:
1.
2.
3.
Expected behavior:
I expect that total summing some combination of known fields will always match the total.
Screenshots (if relevant):
Errors in browser console (if relevant):
Provide logs and/or server output (if relevant):
Any additional context:
Accurate reporting for this interface is critical to testing and debugging Fleet Server. If the REST call is reporting the wrong counts I would expect the Kibana UI to be affected.
The text was updated successfully, but these errors were encountered: