-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP NETOBSERV-1550: Using batchAPIs to help with CPU and memory resources #256
Conversation
@msherif1234: This pull request references NETOBSERV-559 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #256 +/- ##
==========================================
- Coverage 34.04% 33.44% -0.61%
==========================================
Files 47 48 +1
Lines 3836 3905 +69
==========================================
Hits 1306 1306
- Misses 2444 2513 +69
Partials 86 86
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@msherif1234: This pull request references NETOBSERV-559 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
93db299
to
53cbc3a
Compare
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=6d184cc make set-agent-image |
53cbc3a
to
781299c
Compare
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=bfa5ac7 make set-agent-image |
ran scale based of 4.14 summary showing increase in ebpf resources cpuEBPFTotals | cpuEBPFTotals | avg(value) | Fail | 78.57% | 3.405002158 | 6.080411792 | rssEBPFTotals | rssEBPFTotals | avg(value) | Fail | 53.61% | 3404791063 | 5230041771 | |
(pprof) top10 -cum
Showing nodes accounting for 70ms, 3.14% of 2230ms total
Dropped 56 nodes (cum <= 11.15ms)
Showing top 10 nodes out of 92
flat flat% sum% cum cum%
0 0% 0% 1770ms 79.37% github.com/netobserv/netobserv-ebpf-agent/pkg/flow.(*MapTracer).evictFlows
0 0% 0% 1770ms 79.37% github.com/netobserv/netobserv-ebpf-agent/pkg/flow.(*MapTracer).evictionSynchronization
0 0% 0% 1760ms 78.92% github.com/netobserv/netobserv-ebpf-agent/pkg/ebpf.(*FlowFetcher).LookupAndDeleteMap
0 0% 0% 1620ms 72.65% github.com/cilium/ebpf.(*Map).BatchLookupAndDelete (inline)
0 0% 0% 1620ms 72.65% github.com/cilium/ebpf.(*Map).batchLookup
0 0% 0% 1620ms 72.65% github.com/cilium/ebpf.(*Map).batchLookupPerCPU
40ms 1.79% 1.79% 1510ms 67.71% github.com/cilium/ebpf/internal/sysenc.Unmarshal
30ms 1.35% 3.14% 1350ms 60.54% encoding/binary.Read
0 0% 3.14% 1190ms 53.36% github.com/cilium/ebpf.unmarshalBatchPerCPUValue
0 0% 3.14% 1180ms 52.91% github.com/cilium/ebpf.unmarshalPerCPUValue
(pprof) |
added bench mark testing for iterate vs batchdelete api $ go test ./pkg/ebpf/ -exec sudo -bench=BenchmarkFlowFetcher_LookupAndDeleteMap -benchmem -count 5 -run=^#
goos: linux
goarch: amd64
pkg: github.com/netobserv/netobserv-ebpf-agent/pkg/ebpf
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkFlowFetcher_LookupAndDeleteMap/BatchLookupAndDelete-12 403 2507858 ns/op 757583 B/op 2943 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/BatchLookupAndDelete-12 446 2531754 ns/op 746563 B/op 2838 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/BatchLookupAndDelete-12 488 2234317 ns/op 737511 B/op 2753 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/BatchLookupAndDelete-12 526 2209894 ns/op 730663 B/op 2688 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/BatchLookupAndDelete-12 477 2251203 ns/op 739670 B/op 2774 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/IterateLookupAndDelete-12 386 2796254 ns/op 598852 B/op 4355 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/IterateLookupAndDelete-12 345 3105146 ns/op 613746 B/op 4492 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/IterateLookupAndDelete-12 370 2940347 ns/op 604619 B/op 4406 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/IterateLookupAndDelete-12 304 3723941 ns/op 631809 B/op 4664 allocs/op
BenchmarkFlowFetcher_LookupAndDeleteMap/IterateLookupAndDelete-12 326 3699242 ns/op 621145 B/op 4566 allocs/op
PASS
ok github.com/netobserv/netobserv-ebpf-agent/pkg/ebpf 70.103s
|
ab946c5
to
4597d54
Compare
started a repro upstream cilium/ebpf#1343 |
652cac7
to
59faed7
Compare
59faed7
to
bf3361e
Compare
/ok-to-test |
8e763f4
to
237607c
Compare
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=baad512 make set-agent-image |
@msherif1234: This pull request references NETOBSERV-1550 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@msherif1234 I've created a new jira for this PR, NETOBSERV-1550, and the former is used for not-batched LookupAndDelete with my PR #283 |
pkg/ebpf/tracer.go
Outdated
for i, id := range ids[:count] { | ||
for j := 0; j < ebpf.MustPossibleCPU(); j++ { | ||
flows[id] = append(flows[id], metrics[i*ebpf.MustPossibleCPU()+j]) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this, I'm not sure to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are now getting percpu metrics so we need to combine all metrics for each CPU together and assign them to the right flow id
Signed-off-by: Mohamed Mahmoud <[email protected]>
/ok-to-test |
/ok-to-test |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #256 +/- ##
==========================================
- Coverage 34.04% 33.44% -0.61%
==========================================
Files 47 48 +1
Lines 3836 3905 +69
==========================================
Hits 1306 1306
- Misses 2444 2513 +69
Partials 86 86
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Mohamed Mahmoud <[email protected]>
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=f8e7e13 make set-agent-image |
I will close this PR as it never shows any real value switching to batchAPIs vs what we have today should we ever reconsider we can reopen it |
Description
cilium recently added batchAPI support for PerCPU maps this PR to migrate ebpf agent to use batchapis
cilium/ebpf#1315
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.