Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detailed docker network summary stats #25354

Merged

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Apr 27, 2021

What does this PR do?

This PR Creates a new metricset, docker/network_summary. This uses network namespaces in order to fetch per-network namespace stats for a given container, similar to the counters in system/network_summary

I also tried to achieve some level of parity with system/network_summary, as it contains many of the same fields. As a result of this, we're using the same fields and same mapping methods, which have been moved to libbeat.

Why is it important?

A few users have been requesting data at this level, but associated with a given cgroup/container.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • pull down PR and build on a linux system with docker
  • enable docker/network_summary in the module options.
  • with some docker containers running, run this metricset, check to see if docker.network_summary is reporting, and that fields are populated.

@fearful-symmetry fearful-symmetry requested a review from a team April 27, 2021 21:47
@fearful-symmetry fearful-symmetry self-assigned this Apr 27, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 27, 2021
@mergify
Copy link
Contributor

mergify bot commented Apr 27, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b docker-network-summary upstream/docker-network-summary
git merge upstream/master
git push upstream docker-network-summary

@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 27, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: jsoriano commented: /test

  • Start Time: 2021-05-06T14:34:51.490+0000

  • Duration: 93 min 28 sec

  • Commit: b989478

Trends 🧪

Image of Build Times

❕ Flaky test report

No test was executed to be analysed.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this added, this is going to give more consistency when monitoring at the host and the container level.

I think we are going to need to do this as a new metricset as we did in the system module. As in there, this network metricset reports events per interface, and these new counters are per container. This new metricset could be disabled by default, and then we wouldn't need the new configuration flag.

I also wonder if we really need to sum all the counters of all the processes, see my comment about this.

This PR adds detailed network stats to the docker/memory metricset.

By the way, this first sentence of the description confused me a little bit 😅

return &sysinfotypes.NetworkCountersInfo{}, errors.Wrapf(err, "error fetching network counters for PID %d", intPID)
}

summedMetrics = sumCounter(summedMetrics, counters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly I wonder if we are summing here several times the same data. I think that all processes in the same network namespace report the same network counters.

For example if I start two listening netcats in a container, I see the two sockets under both pids:

$ for pid in $(pidof nc);do echo $pid; cat /proc/$pid/net/tcp; done 
3119046
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:1F9B 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36489370 1 0000000000000000 100 0 0 10 0                  
   1: 00000000:1F9C 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36489816 1 0000000000000000 100 0 0 10 0                  
3118539
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:1F9B 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36489370 1 0000000000000000 100 0 0 10 0                  
   1: 00000000:1F9C 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36489816 1 0000000000000000 100 0 0 10 0                  

I also see them if I start a cat inside the container:

$ for pid in $(pidof cat);do echo $pid; cat /proc/$pid/net/tcp; done 
3115294
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:1F9B 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36489370 1 0000000000000000 100 0 0 10 0                  
   1: 00000000:1F9C 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36489816 1 0000000000000000 100 0 0 10 0                  

This also happens if I look to other files such as /proc/$pid/net/netstat or /proc/$pid/net/snmp, the content is the same for all the processes in the same namespace.

Not sure about the API call, but docker container top lists all the processes in the container, so here we may be summing several times the same information if we are looping over all the processes in the container.

I think that we would only need to report the counters of only one pid in the container. Or to be more correct, given that inside a container there can be processes of multiple network namespaces (dind case for example), for correctness we would need to sum the counters once per namespace found, that's it to sum the counters once per different /proc/<pid>/ns/net found.

@fearful-symmetry
Copy link
Contributor Author

@jsoriano No idea how I overlooked the namespacing, you're definitely right, the counters share a namespace.

I was sort of going back and forth on if I should create a new metricset for this. This didn't seem quite "useful" enough on its own, but you might be right. In general, most containers only have one interface, so I wasn't too worried about the awkwardness of reporting this alongside the interface stats, but it also seems sub-optimal.

@fearful-symmetry fearful-symmetry changed the title Add detailed network summary stats to docker/memory Add detailed docker network summary stats Apr 29, 2021
@fearful-symmetry
Copy link
Contributor Author

Alright, I completely refactored this. It's now in its own metricset, and I also removed a lot of the summing logic--we shouldn't have multiple network namespaces per container I think, so we use Inspect to grab a PID and report the network counters.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor, it looks great. Added some small suggestions, after they are addressed I think this can be merged.

we shouldn't have multiple network namespaces per container

This is actually possible in dind (docker in docker) and some other fancier cases, so we might actually sum counters of different namespaces found inside the same container. But I don't think we need to cover these cases, and actually the system module probably doesn't cover multiple namespaces in the host. So current approach looks good to me.

This makes me think that maybe we could have a linux/network_namespace metricset that collects all these metrics per namespace from the host.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fearful-symmetry fearful-symmetry added the Team:Integrations Label for the Integrations team label May 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 3, 2021
@fearful-symmetry fearful-symmetry added the Metricbeat Metricbeat label May 3, 2021
@fearful-symmetry
Copy link
Contributor Author

/test

@jsoriano
Copy link
Member

jsoriano commented May 6, 2021

/package

@jsoriano
Copy link
Member

jsoriano commented May 6, 2021

/packaging

@jsoriano
Copy link
Member

jsoriano commented May 6, 2021

/test

@jsoriano
Copy link
Member

jsoriano commented May 6, 2021

@fearful-symmetry current failure is not related, I think you can merge this.

@fearful-symmetry fearful-symmetry merged commit f7e80da into elastic:master May 6, 2021
fearful-symmetry added a commit to fearful-symmetry/beats that referenced this pull request May 6, 2021
* add detailed network summary stats to docker/memory

* add changelog

* update deps, notice

* update ref yml

* fix test

* move to new metricset, refactor

* revert changes to docker/network

* remove config file

* update xpack

* small fixes

(cherry picked from commit f7e80da)
fearful-symmetry added a commit that referenced this pull request May 17, 2021
…25601)

* Add detailed docker network summary stats (#25354)

* add detailed network summary stats to docker/memory

* add changelog

* update deps, notice

* update ref yml

* fix test

* move to new metricset, refactor

* revert changes to docker/network

* remove config file

* update xpack

* small fixes

(cherry picked from commit f7e80da)

* update docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants