-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent should collect and report CPU and memory usage of service runtime components #4083
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
This was blocked by elastic/beats#17314, which is now resolved. |
So, I'm tinkering around with how to add monitoring config for something like endpoint to the There's also the question of how we report these metrics to begin with; the existing monitoring code uses the |
So, still poking around the codebase, but I'm not sure I see good way to pass the PID, since the monitoring config seems to happen at the same time as the rest of the config, which looks like it would happen before a given be is (re)started. Assuming I'm right, what we could do is restart the monitoring beat after everything else has started, which seems a bit needless and could also result in missing data. |
@nfritts @brian-mckinney we want to include endpoint's CPU and memory usage metrics in Fleet. To do this we need to know the endpoint process's PID. If we added the current PID as a field that can be reported in the This seemed like the simplest approach to us but wanted to confirm. Otherwise we'd be iterating through every PID on the system looking for the one that maps to endpoint-security regularly. If you have other ideas feel free to propose them. |
Overall I don't think adding our PID should be too much work. There might be an easy alternative once we migrate to the named pipe, but we might be able to just start sending the PID when we move to the named pipe gRPC and do both at once. |
Yes we are going to need the PID to continue. The next step should be on us though to define how they provide the PID. @fearful-symmetry put up a PR proposing how we do this in the control protocol so the endpoint team knows where to put the PID. |
So, I'm digging more into the code, and there's an extra caveat here, which is how updates to the monitoring beats happen. The monitoring config is injected into the config model, which happens when we call @faec / @blakerouse you know more about the coordinator, is it advisable to just call Assuming it is, I don't think it would be too hard to add some logic to the coordinator to update the component model if we get a PID update from endpoint. |
@cmacknz Sorry I missed this question the other day. It would be pretty easy for endpoint to incorporate the pid into |
Endpoint PR has been merged: https://github.com/elastic/endpoint-dev/pull/14338 |
Alternatively Endpoint could send you the expected metric document in the |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
## What does this PR do? This is part of elastic/elastic-agent#4083 This adds a `GetOneRootEvent()`. method to `ProcStats` we need this for elastic/elastic-agent#4083 , as that issue requires metricbeat's system/process metricset to monitor a specific PID, which it can't do now. This will have to be followed up with a change to beat itself to rope this in. This also cleans up a bit of code and adds some docs. ## Why is it important? needed for elastic/elastic-agent#4083 ## Checklist - [x] My code follows the style guidelines of this project - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added tests that prove my fix is effective or that my feature works - [ ] I have added an entry in `CHANGELOG.md`
@fearful-symmetry now that elastic/elastic-agent-system-metrics#150 has been merged, what is remaining here? |
@jlind23 mostly blocked by SDHes this week, I'm hoping to have a PR next week. It's 95% done. |
Currently, Agent collects and reports CPU and memory usage of command runtime components (and even that is incomplete). It is not collecting and reporting CPU and memory usage of service runtime components, e.g. Endpoint, leading to undercounting.
The text was updated successfully, but these errors were encountered: