Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to http2 connection detection and bad gRPC paths #969

Merged
merged 9 commits into from
Jun 26, 2024

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Jun 26, 2024

This PR fixes two things:

  1. We sometimes failed to detect HTTP2 connections. In the case of long running connections, we rely on the user space code to look into a unknown TCP data to see if we missed an HTTP2 connection. We poked at the request buffer of the TCP data to see if it looks like HTTP2/gRPC, but in practice the buffers can be reversed if we are unlucky when we started monitoring. We need to look at the response buffer too. I changed the code to extract all known http2 attributes and decide if it found anything useful.
  2. There was one more bug in reading the iov buffers, again on TCP send. We added code to look for the first valid buffer, but the buffers can be broken down in chunks and we were reading the full length of the buffer on the first chunk. This could pull in random memory and it can be incorrectly decoded in gRPC as a path.

TODO:

  • Test on older kernels to see if the BPF code changes didn't break

Closes #960

@grcevski grcevski requested review from mariomac and marctc as code owners June 26, 2024 11:13
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, int);
__type(value, u8[(IO_VEC_MAX_LEN * 2)]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to make this twice as large as the max size we need, otherwise I couldn't convince the verifier the variable sized appends to the buffer are OK.

@@ -80,7 +80,7 @@ func (p *BPFLogger) processLogEvent(record *ringbuf.Record, _ ebpfcommon.Service
err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event)

if err == nil {
p.log.Info(readString(event.Log[:]), "pid", event.Pid, "comm", readString(event.Comm[:]))
p.log.Debug(readString(event.Log[:]), "pid", event.Pid, "comm", readString(event.Comm[:]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marctc I realized I was wrong about this, printing on Info really creates a lot of noise when I debug locally. I flipped the log to debug.

@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 68.51852% with 17 lines in your changes missing coverage. Please review.

Project coverage is 80.54%. Comparing base (9b6a864) to head (199971c).
Report is 7 commits behind head on main.

Files Patch % Lines
pkg/internal/ebpf/common/http2grpc_transform.go 69.23% 12 Missing and 4 partials ⚠️
pkg/internal/ebpf/common/tcp_detect_transform.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #969      +/-   ##
==========================================
+ Coverage   79.48%   80.54%   +1.06%     
==========================================
  Files         132      134       +2     
  Lines       10378    10591     +213     
==========================================
+ Hits         8249     8531     +282     
+ Misses       1608     1557      -51     
+ Partials      521      503      -18     
Flag Coverage Δ
integration-test 55.79% <40.74%> (+1.35%) ⬆️
k8s-integration-test 58.95% <1.85%> (+0.57%) ⬆️
oats-test 36.08% <3.70%> (+0.29%) ⬆️
unittests 50.57% <55.55%> (+3.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@grcevski grcevski merged commit 512e9d0 into grafana:main Jun 26, 2024
6 checks passed
@grcevski grcevski deleted the improved_http2_conn_detect branch June 26, 2024 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increasing cardinality with lots of badly parsed metrics
4 participants