Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat doesnt give network io and disk io per process #7461

Closed
indu-sharma-guavus opened this issue Jun 28, 2018 · 21 comments
Closed

Metricbeat doesnt give network io and disk io per process #7461

indu-sharma-guavus opened this issue Jun 28, 2018 · 21 comments
Assignees
Labels
enhancement estimation:Month Task that represents a month of work. Metricbeat Metricbeat question Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@indu-sharma-guavus
Copy link

Metricbeat is lacking with big feature i.e network io and disk io. These are the most critical metrics which need to be considered. Can somebody take a look at this ?

@andrewkroh
Copy link
Member

Metricbeat can collect this information with its system module.

@andrewkroh
Copy link
Member

Oh, I see in the title only that you also mention "per process".

I'm not aware of per process network metrics on Linux in the /proc unless the process has been put into its own netns. There's an open issue around grabbing this data. Additionally I do think you could get per process data with eBPF (it would just take a bit of work).

So I think this is a duplicate?

There's an open issue for process IO too. #4241

@indu-sharma-guavus
Copy link
Author

Thank you very much @andrewkroh . eBPF is just the best idea, just that integrating its stats with metricbeat may be tedious, and metricbeat is something we can't live without because of its easy compatibility with ELK stack. There are other tools out there just for network i/o and diskio nethugs and iotop respectively.

By the way, i can see you've committed code in your private branch. What shall i do to consume your branch and achieve the one i've been looking for ?

@andrewkroh
Copy link
Member

To get the netns metrics integrated I would checkout elastic/beats from master and port over the changes from andrewkroh@7dd6ba8 by hand. It's been sitting on that branch for close to 2 years and lot's of things in Metricbeat have changed.

The main change being that code for collecting process data was moved over to libbeat so that will be where a lot the changes are needed now. https://github.com/elastic/beats/tree/master/libbeat/metric/system/process

@javadevmtl
Copy link

I had similar debugging issue recently and I had to use a tool like atop to get that info. So I could see disk spikes on the io dashboard, but couldn't associate it back to a process.

@mochiman33
Copy link

Will the disk io and network io per process stats be released on a version of metricbeat in the near future?

@ruflin ruflin added enhancement Team:Integrations Label for the Integrations team labels Jan 7, 2019
@botelastic
Copy link

botelastic bot commented Jul 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Jul 8, 2020
@botelastic botelastic bot closed this as completed Aug 7, 2020
@willemdh
Copy link

willemdh commented Nov 2, 2020

Imho, this shouldn't be closed, see also https://discuss.elastic.co/t/metricbeat-not-getting-diskio/251606/8

@ChrsMark
Copy link
Member

Reopening this since it seems that slipped attention and was closed automatically.

cc: @masci @fearful-symmetry @andrewkroh

@ChrsMark ChrsMark reopened this May 24, 2021
@botelastic botelastic bot removed the Stalled label May 24, 2021
@fearful-symmetry
Copy link
Contributor

Adding this to the dashboard. We recently introduced a feature to report more detailed per-cgroup network metrics, and we can piggyback off that functionality.

system/process is pretty bloated as-is, so this will probably be disabled by default. However, the metricset has so much functionality crammed into it, that I wonder if we should start making system/process_netstat or something.

Either way, will probably not have time to work on this until the release after next.

@fearful-symmetry
Copy link
Contributor

I also filed a similar issue a while ago in #18863

@musiczhzhao
Copy link

Hi @fearful-symmetry,

I saw you have closed the other ticket and moved it to this one. I am wondering what is the latest plan of implementing the per-process network io and disk io into Metricbeat? We have been long to this feature as well.

Thanks,
Zhao

@fearful-symmetry
Copy link
Contributor

Sorry, should have been a little more descriptive. We don't have any short term plans for implementing this, as I'm going to have my hands full for the next 1-2 release cycles. As soon as we're in a place to start making enhancements soon, I'd like to prioritize this, as it shouldn't be too difficult.

@jlind23 jlind23 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed Team:Integrations Label for the Integrations team labels Jan 4, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@symbatkarzhassov
Copy link

We need this feature too)

@amitkanfer
Copy link
Collaborator

@fearful-symmetry can we consider this one as closed?

@musiczhzhao
Copy link

Hi Amitkanfer,

We’ll test this out. Based on the conversation, it seems only the per process network stats are implemented so far? Are the per process disk stats also implemented?

Best,
Zhao

@amitkanfer
Copy link
Collaborator

@musiczhzhao you're right, still pending the disk stats per process

@fearful-symmetry
Copy link
Contributor

Sorry about the delay here; this is still a work in progress, and I think it'll take more time than I realized. The proc/PID/net API that I was planning to use actually reports network counters per-namespace and not per-pid, which complicates things a bit.

leo-ri pushed a commit to leo-ri/elastic-agent-system-metrics that referenced this issue Dec 20, 2023
…lastic#114)

## What does this PR do?

Part of elastic/beats#7461

This ended up being fairly simple; we just fetch per-process IO from
procfs, and these values appear to be largely identical to what's
reported by the netlink taskstat.

This just reads metrics from `/proc/[pid]/io` and reports them as part
of other process metrics, same as how things like memory usage are
reported.

## Why is it important?

We want per-process I/O metrics.

## Checklist


- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added an entry in `CHANGELOG.md`
@fearful-symmetry
Copy link
Contributor

@amitkanfer so, I think we should probably close the remainder of this. We have per-process disk IO, but per-process network IO is a different story; the implementation more or less requires copying packetbeat, to the point that users would probably be better served using the packetbeat process monitor.

@amitkanfer
Copy link
Collaborator

let's close and re-open if needed just for the network IO. Thanks @fearful-symmetry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement estimation:Month Task that represents a month of work. Metricbeat Metricbeat question Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests