-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Management] Agent expose metrics #22793
[Ingest Management] Agent expose metrics #22793
Conversation
[Ingest Manager] Log level reloadable from fleet (elastic#22690)
Pinging @elastic/ingest-management (Team:Ingest Management) |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Steps errorsExpand to view the steps failures
|
Test | Results |
---|---|
Failed | 0 |
Passed | 17385 |
Skipped | 1379 |
Total | 18764 |
"target": "data_stream", | ||
"fields": map[string]interface{}{ | ||
"type": "metrics", | ||
"dataset": fmt.Sprintf("elastic_agent.%s", agentName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we do this for logs? Is it elastic_agent.elastic-agent
? I think its just elastic_agent
.
The -
in elastic-agent
is a problem as well. We should not have that, because we had to change endpoint-security
to endpoint_security
otherwise it breaks how the namespace is used of ending with -default
.
Interesting I have never considered doing it that way, maybe I was always focused on the libbeat thing, I think we are losing file descriptor and the number of goroutines? @blakerouse @ruflin Is that an appropriate way? |
"target": "data_stream", | ||
"fields": map[string]interface{}{ | ||
"type": "metrics", | ||
"dataset": fmt.Sprintf("elastic_agent.%s", fixedAgentName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still question if it should have the ending .elastic_agent
? Why not just metrics-elastic_agent-default
? Or does that not match the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking about following convention, i like shorter one better as well
@michalpristas what type of metrics are we getting with this vs the libbeat way? I assume this is more system based like overall CPU/memory of the process? Seems like it smart way to target the Elastic Agent at a system usage overall level. |
@blakerouse libbeat way was collecting a lot of unusable metrics. what are exposed using /stat and /state endpoint is used in for state it's "state": {
"management": {
"enabled": false
},
"module": {
"count": 3
},
"output": {
"name": "elasticsearch"
},
"queue": {
"name": "mem"
} as we are always management enabled if we consider fleet a management there's nothing else in state we are interested in, we dont have output just yet and no queue for stats: "libbeat": {
"output": {
"events": {
"acked": 0,
"active": 0,
"batches": 0,
"dropped": 0,
"duplicates": 0,
"failed": 0,
"toomany": 0,
"total": 0
},
"read": {
"bytes": 0,
"errors": 0
},
"type": "elasticsearch",
"write": {
"bytes": 0,
"errors": 0
}
}
},
"runtime": {
"goroutines": 39
},
"uptime": {
"ms": 12019
} again we dont have direct output event we can monitor, just runtime.goroutines is something interesting and uptime. with system process we collect much valuable information https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-system-process.html |
@michalpristas Can you add a JSON document that is generated and we can have @ravikesarwani review it? |
Are there Elastic Agent specific metrics we want to add? For example how many processes are running, how many config changes and similar? If we add these, where would these be added? |
@ruflin i suppose we could create custom module for metricbeat which will monitor whatever agent exposes on predefined endpoint? |
Maybe we have this module already and we just use http? |
"from": "http.agent.beat.handles", | ||
"to": "system.process.fd", | ||
}, | ||
// I should be able to see fd usage. Am I keep too many files open? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a small typo in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes copycat striked here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not copycat, my file just got paritally saved, i hate when this happens
@michalpristas I am missing something to have memory per process? Looks like the memory is send only for the elastic-agent events |
@nchaulet try rebuilding all beats (from x-pack directory running |
it's working :) |
@michalpristas Could you put an example of the final version of the doc into the PR description? |
@ruflin done |
LGTM. Nit: An empty |
in reality its not, i was just replacing metrics section with updated one here |
The final doc does not show any |
will take a look need to spinup kibana |
@simitt i copied entired document for linux client now instead of modifying local changes manually |
Thanks @michalpristas! The metrics LGTM. |
@ph @ruflin @blakerouse can i get approval, seems metrics are fine, nicolas is already building dashboards on top of this agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the doc structure, I'll leave to @blakerouse to check the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good.
* [Ingest Manager] Log level reloadable from fleet (elastic#22690) [Ingest Manager] Log level reloadable from fleet (elastic#22690) * aa * create drop * updated drop * process contains everything * drop start time * undo exposed endpoint * sanitize dataset name * ups * agent expose http * collect all metrics from beats * colelct all from beats * golint * cleaner docs * updated structure * cgroup * long live file saving issues (cherry picked from commit 49c8d87)
…23105) * [Ingest Management] Agent expose metrics (#22793) * [Ingest Manager] Log level reloadable from fleet (#22690) [Ingest Manager] Log level reloadable from fleet (#22690) * aa * create drop * updated drop * process contains everything * drop start time * undo exposed endpoint * sanitize dataset name * ups * agent expose http * collect all metrics from beats * colelct all from beats * golint * cleaner docs * updated structure * cgroup * long live file saving issues (cherry picked from commit 49c8d87) * Add changelog. Co-authored-by: Michal Pristas <[email protected]>
What does this PR do?
Using system package focused on agent process we are collecting CPU,disk and memory metrics which are sent to ds.elastic_agent-elastic-agent
At first i was playing with exposing endpoint and using
beat
module to collect some information about agent but i let it go as most of information collected using this module is not relevant expect for go-routines and it makes code bloated with unnecessary setups providing empty values for fields which are noncollectable/unreportable from agent point of view.Why is it important?
#22394
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Example of final doc
linux
mac